Re: Whitelist or BAYES?

2024-10-01 Thread Bill Cole

On 2024-09-30 at 16:22:49 UTC-0400 (Mon, 30 Sep 2024 16:22:49 -0400)
joe a 
is rumored to have said:


On 9/27/2024 04:05:51, Matus UHLAR - fantomas wrote:

On 26.09.24 10:27, joe a wrote:

Maybe I should not ask this, but . . .

A relatively innocuous member informational email from a local town 
Library (monthly) gets marked as spam as shown below.
The BAYES_99 and BAYES_999 values are something I am toying with for 
other reasons.  Seems odd these should hit either one of those 
tests.


So, on the one hand I can add them to whitelist and be done with it, 
or I can add

them to missed HAM for re-learning.

Which is the best approach?


so far, both. You may need to relearn multiple their (monthly) mails 
before it has effect.



X-Spam-Report:
*  4.1 BAYES_99 BODY: Bayes spam probability is 99 to 100%
*  [score: 1.]
*  5.0 BAYES_999 BODY: Bayes spam probability is 99.9 to 
100%

*  [score: 1.]


You have raised BAYES_99 and BAYES_999 to huge values so I recommend 
to rethink that.


You some "don't because" examples?   Seems to me, off hand, that if 
it's 99% or 99.9% then a high value does no harm.  Perhaps half what 
I have would be sufficient though.


Bayes is a statistical method and so will always make some errors, as in 
this case. BY DEFINITION, one in a hundred messages hitting BAYES_99 
will be ham, as will one in a thousand that hits BAYES_999.


I can't claim that the default scores are the best possible ones, but 
they don't result in many false positive *final scores* for most people.




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-24 Thread Bill Cole
On 2024-09-24 at 12:59:51 UTC-0400 (Tue, 24 Sep 2024 12:59:51 -0400)
Jared Hall via users 
is rumored to have said:

> On 9/24/2024 10:10 AM, Matus UHLAR - fantomas wrote:
>>
>> I understand this case as "abusers" instead of users.
> One man's use is another man's abuse.  Limits are reached and False Negatives 
> are produced by DNSWL.
>
[analysis points elided]

> 1) Contraction in the Email services market; less "systems" expertise is 
> available.

A problem we've been battling forever. It won't be getting better any time soon.

> 2) DIY installs also "dumb-down" systems knowledge requirements.

Yes, and we can't hope to fight against that. Many systems these days are being 
packaged in ways that "just work" with minimal effort or knowledge, for the 
large majority of the user audience.

> 3) SA has a desire to provide some protection in a default installation.

Because we know that people do as little work as possible to get something 
"working" even if that's in a degraded state.

> 4) Migration to Zero-Trust environments.

Unclear relevance...

> 5) Integration of DNS into O/S (like the stub resolver problems in 
> Debian/Ubuntu) - can't just slap BIND on a machine anymore.

Sounds like a Linux problem :)

It's actually quite easy on many platforms to bring up an Unbound recursive 
resolver for local resolution. I think it was actually presented as a choice 
for the latest Alma(EL9) and FreeBSD machines I loaded from install media. Yes, 
the systemd resolver is garbage.

> I am 100% FOR dropping DNSWL, any way it is done, although I don't have any 
> problem with the existing handling of BLOCKED responses from Validity, 
> SpamHaus, and others.  It *seems to me* that DNSWL-type services are better 
> used as overrides at SMTP-time to DNSBL blocks.

People will always differ on which tools to use at which layer, but I tend to 
agree that positive reputation sources are best applied to keep mail away from 
ever hitting SA. SA is a semi-transparent black box. It makes mistakes 
*intrinsic to its design* in both directions. If you want to exempt mail from 
ever being blocked by SA, not showing it to SA is best. That's why I use a 
relatively heavyweight milter (MIMEDefang) which can choose whether or not to 
expose messages to SA.

>>> Doing that is a legitimate choice by a reputation service, but it's not one 
>>> SA can endorse. The fact that it is enforced by whim rather than 
>>> mechanically is not a positive factor.
>> Is there any possibility to detect clients using open DNS, perhaps other 
>> than RCVD_IN_ZEN_BLOCKED_OPENDNS ?
>>
>> Then, block all dnsbl/rhsbl rules?
>>
> I don't see any truly viable solution without conducting other lookups first. 
>   A possible alternative would be to configure an unrestricted open DNS 
> server that returns to the client, in response to a query, the IP address of 
> the DNS host from where the query originated.  Sort of like the old, 
> never-used, TCP Echo service.
>
> Of course, the devil is in the details.  But I like your thinking Matus :)  
> My mind is about as sharp as a cooked linguine noodle. I'm sure there are a 
> lot of people out there that can conjure up better solutions.

As I said in a previous message: patches are welcomed.


-- 
Bill Cole


Apology (was Re: ATTENTION: DNSWL to be disabled by default.)

2024-09-24 Thread Bill Cole
On 2024-09-24 at 09:13:16 UTC-0400 (Tue, 24 Sep 2024 09:13:16 -0400)
Bill Cole 
is rumored to have said:

> On 2024-09-24 at 04:18:06 UTC-0400 (Tue, 24 Sep 2024 10:18:06 +0200)
> Matthias Leisi 
> is rumored to have said:
> (Quoting me)
>>>
>>> people who don't configure it correctly, in a way that is *almost 
>>> invisible.* The lower rate limit which they established in March of this 
>>> year isn't inherently bad, it just meant that enough people were hitting 
>>> the limit that someone bothered opened a bug about it.
>>>
>>
>> There is none new rule. The limit of 100‘000 per 24 hours has been in place 
>> for years.
>
> That's an interesting assertion. The page I cited has apparently changed in 
> the past day and the previous statement of a new policy has vanished. I'm 
> happy with assuming that it was an error that you've corrected.

I WAS WRONG.

The apparent explanation for that error is that I had both of these pages 
opened and somehow conflated them.

https://knowledge.validity.com/s/articles/Accessing-Validity-reputation-data-through-DNS?language=en_US
https://www.dnswl.org/?p=120

I am sorry for suggesting that this was a change, as it was clearly entirely my 
error. I have corrected the error in the rules file comment. Sadly, sent mail 
is forever...

-- 
Bill Cole


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-24 Thread Bill Cole
On 2024-09-24 at 05:09:50 UTC-0400 (Tue, 24 Sep 2024 11:09:50 +0200)
Tom Bartel 
is rumored to have said:

> I'm not sure if the 10,000 limit is possibly in reference to the Validity
> allow list...
>
> https://knowledge.validity.com/s/articles/Accessing-Validity-reputation-data-through-DNS?language=en_US
>
> We recently added a registration gate - no fees for usage above 10,000 / 30
> days, however registration of your query IPs will give you that capability.

MEA CULPA.

I'm not sure how I managed to do it, but that is almost certainly the 
explanation of my obvious error.


>
> Tom
>
> On Tue, Sep 24, 2024 at 10:16 AM Peter Ajamian 
> wrote:
>
>> On 24/09/24 05:02, Bill Cole wrote:
>>> Note
>>> that as of 2024-03-01 (as documented at the DNSWL link above) they have
>>> reduced the free limit to 10,000 queries per 30 days. A site feeding 350
>>> messages/day to SpamAssassin will exceed that limit. That is small even
>>> for "personal" systems.
>>
>> I've hunted through the links and the DNSWL.org site and cannot find any
>> reference to 10,000 queries per 30 days.  I do find lots of references
>> to the 100,000 queries per day limit, though.  Can you point out exactly
>> where the 10,000 reference is?
>>
>>
>> Thanks,
>>
>>
>> Peter
>>
>
>
> -- 
> Phone: 303.517.9655
> Website: https://bartelphoto.com
> Instagram: https://instagram.com/bartel_photo
>
> "Life's most persistent and urgent question is, 'What are you doing for
> others?'" - Martin Luther King Jr.


-- 
Bill Cole


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-24 Thread Bill Cole
On 2024-09-24 at 10:10:24 UTC-0400 (Tue, 24 Sep 2024 16:10:24 +0200)
Matus UHLAR - fantomas 
is rumored to have said:

>>>> TL;DR: Rather than using an in-band signal of a special reply value to 
>>>> queries from blocked users, as do other DNS-Based List operators, 
>>>> DNSWL.org sends back a "listed high" response to all queries. I was unaware
>
>> On 2024-09-24 at 04:18:06 UTC-0400 (Tue, 24 Sep 2024 10:18:06 +0200) 
>> Matthias Leisi  is rumored to have said:
>>> Not to all queries. It is sent to resolvers who consistently go above the 
>>> limits, sometimes for months and years after receiving the blocked response.
>
> On 24.09.24 09:13, Bill Cole wrote:
>> I don't see how that's significant. The documented policy is directly and 
>> intentionally harmful to users.
>
> I understand this case as "abusers" instead of users.

In the context of spam control tactics, I'm not ready to call people who have 
no idea (and no way to see) that they are part of abusive behavior, "abusers."

E.g. the cited bug. It was reported by someone with no control of their SA 
config, as it is handled by their "web host." Presumably they use something 
like cPanel which puts email in the hands of the platform provider rather than 
the domain owner. The provider may (or may not) have seen the BLOCKED replies 
whenever they actually occurred, but the end user only knows that now, he gets 
mail from the worst spammers marked as definitively good by DNSWL, courtesy of 
SA. That's bad for the user, for SA, for DNSWL, and for the host.

>> Doing that is a legitimate choice by a reputation service, but it's not one 
>> SA can endorse. The fact that it is enforced by whim rather than 
>> mechanically is not a positive factor.
>
> Is there any possibility to detect clients using open DNS, perhaps other than 
> RCVD_IN_ZEN_BLOCKED_OPENDNS ?
>
> Then, block all dnsbl/rhsbl rules?

That sounds like a *great* idea and I'm sure it could be implemented.

Patches welcome, always. This list's sibling at dev@s.a.o is the ideal place to 
discuss implementation detail with others. Those of us able to commit to the 
repo are always happy to add other people's code and credit it, but for the 
most part the evidence supports the conclusion that as a group we are not 
wealthy enough in free time to add features to SA.

Another approach which could be simpler is to score the *_BLOCKED rules 
strongly enough to set off alarms. I don't like that much because it is using 
damage to get attention, but at least it would lead alarmed users to a correct 
conclusion about the root cause, rather than misrepresenting a reputation 
service's actual answer.

-- 
Bill Cole


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-24 Thread Bill Cole

On 2024-09-24 at 04:18:06 UTC-0400 (Tue, 24 Sep 2024 10:18:06 +0200)
Matthias Leisi 
is rumored to have said:
(Quoting me)


people who don't configure it correctly, in a way that is *almost 
invisible.* The lower rate limit which they established in March of 
this year isn't inherently bad, it just meant that enough people were 
hitting the limit that someone bothered opened a bug about it.




There is none new rule. The limit of 100‘000 per 24 hours has been 
in place for years.


That's an interesting assertion. The page I cited has apparently changed 
in the past day and the previous statement of a new policy has vanished. 
I'm happy with assuming that it was an error that you've corrected.


However, as I said, the only significance of a particular rate limit is 
how many people are affected. The scale of the harm is not relevant, the 
problem is the intentional infliction of harm on users who likely have 
no idea what is happening.


This change in the SA rules was supposed to have been made 13 years ago. 
That's when the  decision was made, based on the 100k/day threshold. The 
only reason I felt the need to announce it was the fact that back in 
2011, the intended change did not actually happen, so people have been 
using DNSWL even while the relevant rules file stated that the rules 
were disabled by default.


Enforcement of the limit is intentionally „weak“, we only look at 
new „overusers“ every few weeks.


Irrelevant. The policy is intentionally harmful. It's weak enforcement 
could even be seen as a problem per se.


TL;DR: Rather than using an in-band signal of a special reply value 
to queries from blocked users, as do other DNS-Based List operators, 
DNSWL.org sends back a "listed high" response to all queries. I was 
unaware




Not to all queries. It is sent to resolvers who consistently go above 
the limits, sometimes for months and years after receiving the blocked 
response.


I don't see how that's significant. The documented policy is directly 
and intentionally harmful to users. Doing that is a legitimate choice by 
a reputation service, but it's not one SA can endorse. The fact that it 
is enforced by whim rather than mechanically is not a positive factor.


# DNSWL is a commercial service that requires payment for servers 
over 100K queries daily.




The subscriptions to dnswl.org easily covers the infrastructure cost, 
but not much more.


— Matthias, for the dnswl.org project


Semantic dispute. Charging a fee for a service is intrinsically and 
unavoidably commercial. I appreciate that you are not running the 
service as a means of building wealth.


Personally, I consider the existence of DNSWL to be positive for the 
email ecosystem. I believe that sites which stay within the limit can 
reduce FPs by using it. That does not change the basic fact that using 
it blindly is dangerous. Just as new system installations don't deploy a 
fully-functioning MTA to accept external mail, SA strives to NOT enable 
dangerous 3rd-party tools by default.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-23 Thread Bill Cole

On 2024-09-23 at 13:08:17 UTC-0400 (Mon, 23 Sep 2024 17:08:17 +)
Grega via users 
is rumored to have said:

Maybe disable VALIDITY  rule as well... They also have 10k limit in 30 
days window ..


My understanding is that Validity returns a specific value 
(127.255.255.255) for blocked queries. That makes it safe to have the 
rules enabled because you then hit the BLOCKED rule for the specific 
Validity list, which has a trivial non-zero score. That is a *visible 
and harmless* marker on almost every message which should be noticed by 
the user, who can correct the underlying configuration error.


DNSWL.org *intentionally causes harm* for people who don't configure it 
correctly, in a way that is *almost invisible.* The lower rate limit 
which they established in March of this year isn't inherently bad, it 
just meant that enough people were hitting the limit that someone 
bothered opened a bug about it.


As I noted in my lengthy comment in that bug report, we (the SA 
community, particularly committers) are not an organized workforce with 
duties and assignments, and we make changes to established 
statically-scored rules on an as-noticed and as-needed basis. This is 
partly because we are considerate of the fact that we have users who 
build on top of the mostly-stable default rules. It is also because we 
are all volunteers, with lives and jobs that generally take priority 
over making SA better.






Regards,G

____
From: Bill Cole 
Sent: Monday, September 23, 2024 19:03
To: SpamAssassin-Users
Subject: ATTENTION: DNSWL to be disabled by default.


Context:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8193
https://www.dnswl.org/?p=120

TL;DR: Rather than using an in-band signal of a special reply value to 
queries from blocked users, as do other DNS-Based List operators, 
DNSWL.org sends back a "listed high" response to all queries. I was 
unaware of this until bug 8193 was opened and linked to the DNSWL 
statement of that policy. As I write in a comment on that bug, no one 
should ever be using DNSBLs of any sort blindly and the onus is on the 
configuring user of SA to select them prudently as they all have 
limits.



I believe this is a problem that needs fixing, but it's a change that 
may surprise some users. Consider yourself warned...


Right now, there's a comment in 50_scores.cf (the file for 
manually-set scores) that I had not previously seen:


# DNSWL is a commercial service that requires payment for servers over 
100K queries daily.
# Unfortunately, they will return true answers for DNS servers they 
consider abusive so

# SA Admins must enable these rules manually.

And yet, the scores following that comment *enables* the rules. Note 
that as of 2024-03-01 (as documented at the DNSWL link above) they 
have reduced the free limit to 10,000 queries per 30 days. A site 
feeding 350 messages/day to SpamAssassin will exceed that limit. That 
is small even for "personal" systems.


Pending a discussion on the issue reaching some other consensus, I am 
immediately changing all those scores to zero in 50_scores.cf so that 
the rules WILL BE DISABLED by default as documented in the comment. I 
am also correcting the rate cited in that comment. This change should 
take effect in the rules distribution in the next couple of days.


Whether or not you want to use DNSWL is very much a local choice. At 
10k queries/month, MOST sites will need to either register (and likely 
pay DNSWL) or leave the rules disabled.


   b...@scconsult.com or billc...@apache.org
   (AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

   Not Currently Available For Hire



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


ATTENTION: DNSWL to be disabled by default.

2024-09-23 Thread Bill Cole

Context:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8193
https://www.dnswl.org/?p=120

TL;DR: Rather than using an in-band signal of a special reply value to 
queries from blocked users, as do other DNS-Based List operators, 
DNSWL.org sends back a "listed high" response to all queries. I was 
unaware of this until bug 8193 was opened and linked to the DNSWL 
statement of that policy. As I write in a comment on that bug, no one 
should ever be using DNSBLs of any sort blindly and the onus is on the 
configuring user of SA to select them prudently as they all have limits.



I believe this is a problem that needs fixing, but it's a change that 
may surprise some users. Consider yourself warned...


Right now, there's a comment in 50_scores.cf (the file for manually-set 
scores) that I had not previously seen:


	# DNSWL is a commercial service that requires payment for servers over 
100K queries daily.
	# Unfortunately, they will return true answers for DNS servers they 
consider abusive so

# SA Admins must enable these rules manually.

And yet, the scores following that comment *enables* the rules. Note 
that as of 2024-03-01 (as documented at the DNSWL link above) they have 
reduced the free limit to 10,000 queries per 30 days. A site feeding 350 
messages/day to SpamAssassin will exceed that limit. That is small even 
for "personal" systems.


Pending a discussion on the issue reaching some other consensus, I am 
immediately changing all those scores to zero in 50_scores.cf so that 
the rules WILL BE DISABLED by default as documented in the comment. I am 
also correcting the rate cited in that comment. This change should take 
effect in the rules distribution in the next couple of days.


Whether or not you want to use DNSWL is very much a local choice. At 10k 
queries/month, MOST sites will need to either register (and likely pay 
DNSWL) or leave the rules disabled.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Disable validity rules

2024-09-23 Thread Bill Cole

On 2024-09-23 at 09:15:25 UTC-0400 (Mon, 23 Sep 2024 13:15:25 +)
Grega via users 
is rumored to have said:


Hi.


Where can one disable this?


One can disable any rule by adding a score line in local.cf for the rule 
with a score of 0, e,g,:



score  RCVD_IN_VALIDITY_CERTIFIED_BLOCKED  0



RCVD_IN_VALIDITY_CERTIFIED_BLOCKED  ADMINISTRATOR NOTICE: The 
query to Validity was blocked. See 
https://knowledge.validity.com/hc/en-us/articles/20961730681243 for 
more information.
RCVD_IN_VALIDITY_RPBL_BLOCKED   ADMINISTRATOR NOTICE: The query to 
Validity was blocked. See 
https://knowledge.validity.com/hc/en-us/articles/20961730681243 for 
more information.
RCVD_IN_VALIDITY_SAFE_BLOCKED   ADMINISTRATOR NOTICE: The query to 
Validity was blocked. See 
https://knowledge.validity.com/hc/en-us/articles/20961730681243 for 
more information.


Thanks!



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Tips on training bayes?

2024-09-19 Thread Bill Cole

On 2024-09-17 at 16:29:52 UTC-0400 (Tue, 17 Sep 2024 16:29:52 -0400)
Alex 
is rumored to have said:




It is up to the user, ie you, what is and what is not spam.



Well, yes, and no.

Of course it's my own system and I can define these terms however I 
wish.
I'm also familiar with the need to investigate every message - perhaps 
I

should have made that clear initially.

It's only these few types of messages that are very subjective and
experience from the broader open source community would be 
appreciated.


The debate over the specific definition of "spam" is an old and diverse 
conversation. It has damaged friendships and careers.



If it has a legitimate unsubscribe link, does that make it ham?


No.


What criteria do you use to determine "spamminess/haminess of EVERY
message"?


The Official Lumber Cartel acronym for spam is UBE:

Unsolicited: the sender has no sound reason to believe that the target 
requested this particular email (or narrowly defined class of email.)


Bulk: the sender appears to have sent substantially the same message to 
many different people without meaningful targeting. This can be inferred 
from generic content directed at the widest audience, e.g. commercial or 
political advertising.


Email: obvious.

Judging that requires some knowledge of the target. I can't tell you 
whether your borderline email is spam. Neither can SA, but Bayes is one 
way it tries to guess.


Is the goal to have every message one of either BAYES_00 or BAYES_99 
or is
it okay that newsletters (for example) are BAYES_50, and let other 
rules,

like network checks, determine the score?


The logical model of Naive Bayesian classification is for strictly 
binary classes. A message is either ham or spam. Identical messages can 
be ham in one mailbox and spam in another, so I suppose one could more 
accurately see the classification as being of the combined email and its 
envelope of metadata.


Bayesian classification does NOT provide a degree of "spamminess" in 
email, it provides a probability of mail being spam. That is a subtle 
but important distinction. A 50% Bayes score doesn't mean a message is 
semi-spam, it means Bayes cannot tell whether the message is spam. So 
yes, it is *OK* that Bayes can't tell whether a newsletter that has 
spam-like content but has an unsub link going to a usually-good ESP is 
spam or ham. A lot of email is that way: its insane HTML and/or 
hype-filled wording smells like spam but since the target wants it, it's 
ham.


This is a core design principle in SA: there's no perfect objective test 
for spam. That's why we have hundreds of scored rules and sub-rules and 
multiple shared reputation tests. A single test (such as Bayes) being 
wrong is not a flaw, it is an inescapable attribute of SA's design.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Use of uninitialized value $response[0]

2024-09-17 Thread Bill Cole
On 2024-09-17 at 13:10:13 UTC-0400 (Tue, 17 Sep 2024 18:10:13 +0100)
Niamh Holding 
is rumored to have said:

> Hello
>
> I'm seeing the following logged by Procmail in one and only one mailbox and 
> as far as I can see there is no difference in the Procmail recipe calling 
> Spamassassin in all the mailboxes
>
>  Procmail: Match on "< 256000"
> procmail: Locking "spamassassin.lock"
> procmail: Executing "/usr/local/bin/spamassassin"
> Sep 17 18:08:24.727 [16350] warn: no response
> Sep 17 18:08:24.727 [16350] warn: Use of uninitialized value $response[0] in 
> pattern match (m//) at 
> /usr/local/share/perl5/Mail/SpamAssassin/Plugin/Pyzor.pm line 307.
> procmail: [16344] Tue Sep 17 18:08:25 2024
> procmail: Unlocking "spamassassin.lock"

You should upgrade to 4.0.1. That error on that line indicates that you are 
running an obsolete 3.4.x version.

The likely root cause there is the lack of any reply from the Pyzor server, 
which is unlikely to be a per-user
condition.

-- 
Bill Cole


Noise Around This List (was Re: Bayes in V4 compared to V3)

2024-09-13 Thread Bill Cole

On 2024-09-13 at 09:13:58 UTC-0400 (Fri, 13 Sep 2024 15:13:58 +0200)
Benny Pedersen 
is rumored to have said:


Bill Cole skrev den 2024-09-13 15:03:


Please send any replies to the list only.


unsubscribe listarchivers ?

and make archived on apache.org with bugzilla login

don't know if it will help or not, but chicken and egg


ASF has a core principle that our projects are managed and supported 
transparently. Restricting the ability to read any users@*.a.o list 
would be a severe departure from that principle. Subscribers must be 
disruptive to the list  on a persistent basis to be banned, by a 
unanimous consensus of the PMC. It is a very high bar.


Note that we also don't exert prior control over who can submit to our 
Bugzilla.  We handle spam there on a whack-a-mole basis, which has 
proven adequate for many years.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Bayes in V4 compared to V3

2024-09-13 Thread Bill Cole
Please note that "Reindl Harald" is excluded from posting to the 
SpamAssassin Users mailing list as a consequence of past behavior. It is 
my understanding that they still follow the list via some public archive 
and reply off-list whenever they have an opportunity to be rude towards 
people with SpamAssassin difficulties.


Whether or not their advice is worth considering is obviously a personal 
judgment, but you should be aware that you are speaking with someone who 
has in the past worked to disrupt this list (and others.)


Please send any replies to the list only.

On 2024-09-13 at 05:00:17 UTC-0400 (Fri, 13 Sep 2024 09:00:17 +)
Grega 
is rumored to have said:


Do you have V3 or V4 SA?



From: Reindl Harald (privat) 
Sent: Friday, 13 September 2024 10:57
To: Grega; Bill Cole; Grega via users
Subject: Re: Bayes in V4 compared to V3

autolearn was always a blackbox

that below are the stats for the current month and that bayes is built
from 2014 until now and i rebuild it from scratch every month

the corpus of 178.138 messages is stored as single eml-files

a few errors with autolearn over the years can amplify and render your
bayes usesless over time with no way to do anything because you don't
have the corpus and don't know what was trained how

[root@mail-gw:~]$ bayes-stats.sh
0 135700SPAM
0  42438HAM
05116765TOKEN

total 514M
  24K -rw-r- 1 sa-milt sa-milt  24K 2024-09-12 14:11 bayes_seen
129M -rw-r- 1 sa-milt sa-milt 160M 2024-09-12 14:11 bayes_toks
386M -rw-r- 1 sa-milt sa-milt 386M 2024-09-12 14:10 wordlist.db

BAYES_00 4455   45.10 %
BAYES_05  3633.67 %
BAYES_20  4714.76 %
BAYES_40  4404.45 %
BAYES_50 2106   21.32 %
BAYES_60  1191.20 % 5.87 % (OF TOTAL BLOCKED)
BAYES_80  1081.09 % 5.33 % (OF TOTAL BLOCKED)
BAYES_95   810.82 % 4.00 % (OF TOTAL BLOCKED)
BAYES_99 1735   17.56 %85.72 % (OF TOTAL BLOCKED)
BAYES_9991572   15.91 %77.66 % (OF TOTAL BLOCKED)

DELIVERED   13865   88.15 %
DNSWL   14376   91.40 %
SPF 15203   96.66 %
SPF/DKIM WL  5705   36.27 %
SHORTCIRCUIT 5894   37.47 %

BLOCKED  2024   12.86 %
SPAMMY   2043   12.98 %   100.93 % (OF TOTAL BLOCKED)

Am 13.09.24 um 10:51 schrieb Grega:

This strategy worked really great in V3 and bayes was excellent even
with autotrain and ocasionally manual training.


Now it`s non decisive and useless at least for me.

We have around 5k-7k daily mails...




*From:* Reindl Harald (privat) 
*Sent:* Friday, 13 September 2024 10:22
*To:* Grega; Bill Cole; Grega via users
*Subject:* Re: Bayes in V4 compared to V3


Am 13.09.24 um 06:53 schrieb Grega via users:
And I`m reconfiguring autolearn to -4 for HAM and 12 for SPAM to 
really

auto-train on correct mails...


this is even more nonsense than autolearn itself

what you really want to train are wrong classified messages and 
decision

can only be made by an human

if you train wrong classified mails in both directions you amplify 
the

incorrect result

it happens that HAM MAILS have a score above 12 from time to time
because of blacklists and over-aggressive rules and when you then
atolearn the content as spam your bayes will result in what it is now



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Bayes in V4 compared to V3

2024-09-12 Thread Bill Cole

On 2024-09-12 at 14:05:11 UTC-0400 (Thu, 12 Sep 2024 18:05:11 +)
Grega via users 
is rumored to have said:


Hi.

I have SA 4.0.1 configured it, all is good, except for bayes. It IS 
working, it IS learning but when it classifies mail it is really not 
so decisive as it was in V3.

I have:

dbg: bayes: corpus size: nspam = 1190, nham = 12441 dbg: bayes: DB 
expiry: tokens in DB: 979401, Expiry max size: 150, Oldest atime: 
1725361640, Newest atime: 1725888528, Last expire: 0, Current time: 
1725888537

So I have enough spam/ham and really enough tokens...
What I find weird is this:
BAYES_50 and BAYES_40 have like 10.000 hits EACH which is ALOT

BAYES_80 only 600
BAYES_95 even less: 341
BAYES_99: 284
BAYES_20 only 150
BAYES_60 only 87
I have no BAYES lower than 40 at all.


What's that BAYES_20 line then?


I am training and also use autolearn.
I have also transferred corpus trained on SA v3 where it worked 
correctly.
Is Spamassassin v4 really so much more conservative or am I doing 
something wrong here?


There were substantial changes in the Bayes module between v3 and v4. 
Training the exact same corpus in the exact same order into v3.4x and 
4.0x will yield different scores, due to *bug fixes* and *improvements* 
in parsing headers. In principle this should make scoring more 
consistent and accurate, which may mean fewer extreme scores. In theory, 
better parsing should result in some common tokens being split 
differently, yielding more diversity in their metrics. We also updated 
'stopword' lists for various languages, removing tokens that are so 
common that they cannot help scoring in principle.


So, no, you are not doing anything wrong. We may need to re-examine the 
default scores for the BAYES_* rules to adapt but that has no concrete 
plan behind it.


With that said, I looked at recent logs on one system running the SA 
development trunk (which has no added Bayes changes relative to 4.0.1) 
and got this distribution:


16444 BAYES_00
  20 BAYES_05
  22 BAYES_20
  13 BAYES_40
  64 BAYES_50
   2 BAYES_60
   6 BAYES_80
   2 BAYES_95
 139 BAYES_99
 138 BAYES_999

That is a machine that excludes most blatant spam at the SMTP layer, 
without handing it to SA.




Also;
One more thing...
Some mails even dont have BAYES added in score list, confirmed on 2 
installs


How many?

While you are initially training the Bayes DB and lack adequate ham and 
spam counts, you get no BAYES hits. Also, if you have any rules set to 
"shortcircuit" they can cause SA to stop checking before Bayes is done.


I *think* I've also seen Bayes skip on excess load, with too much lock 
contention on a file-based mechanism like Berkeley DB.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: M365 phish with USER_IN_DKIM_WHITELIST

2024-08-30 Thread Bill Cole

On 2024-08-30 at 13:35:02 UTC-0400 (Fri, 30 Aug 2024 13:35:02 -0400)
Alex 
is rumored to have said:


Hi,
I'm hoping someone can help me understand how what appears to be an 
invoice

scam was passed through legitimate MS servers and
even USER_IN_DKIM_WHITELIST.


USER_IN_DKIM_WHITELIST refers to an explicit (i.e site or user-specific) 
welcomelist, so this you did to yourself...



From: Microsoft 


There you go. *You* welcomelisted microsoft.com.

And Microsoft signed and sealed that mail. They believe it is entirely 
legit. They are not actually a reliably trustworthy entity on that 
topic, in fact I'd say they are quite prominently lousy at it.



Date: Fri, 30 Aug 2024 15:50:53 +
Subject: Your Microsoft order on August 30, 2024
Message-ID: 
<1ccff35e-284a-4b08-bef9-737552452...@az.westus3.microsoft.com>

To: rebeccaflam...@rebeccaflaming.onmicrosoft.com

It also hit a few of my local test rules, including one that hits when 
MS
mail is sent to us with a different To domain, but it received a 
negative

score because of being on the default DKIM whitelist.


It is NOT on the default list. That would be a hit on the 
USER_IN_DEF_*LIST rules. The only MS domain in the default list is 
accountprotection.microsoft.com. The rest is garbage...



https://pastebin.com/fmjK9AfK


Microsoft signed it. You have a rule that says you trust Microsoft to 
sign only their own non-spam mail.


Everyone makes trust errors... It's a recurring trope of many lives and 
of history.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: QR phish missed

2024-08-16 Thread Bill Cole

On 2024-08-16 at 08:03:05 UTC-0400 (Fri, 16 Aug 2024 08:03:05 -0400)
Alex 
is rumored to have said:


It says that SPF failed, but SPF_PASS was hit, presumably from our
connection to Microsoft, not their connection to the spammer client:


Correct. You can only check SPF on the first SMTP transaction guided by 
an MX record and recorded by a trusted server.


Received-SPF: Fail (protection.outlook.com: domain of toppersrvs.com 
does

not
 designate 35.230.39.135 as permitted sender) receiver=
protection.outlook.com;
 client-ip=35.230.39.135; helo=[127.0.0.1];

Received-SPF: Pass (mailfrom) identity=mailfrom; 
client-ip=52.100.167.207;

helo=nam12-mw2-obe.outbound.protection.outlook.com; envelope-from=
administra...@toppersrvs.com; receiver=buckknives.com

ARC also failed:
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=fail (sender 
ip is

 35.230.39.135) smtp.rcpttodomain=buckknives.com smtp.mailfrom=
toppersrvs.com;
 dmarc=none action=none header.from=toppersrvs.com; dkim=none (message 
not

 signed); arc=none (0)

Should I also somehow be checking these SPF failures?


You really can't with SA, because it is not generally safe to trust the 
Received headers written by systems you don't control or have some sort 
of explicit relaying arrangement with. Because the initial submission of 
messages CANNOT be subjected to SPF tests, you don't want to test 
transactions that are not following an MX record.




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Blocking Malformed "From" Headers

2024-07-18 Thread Bill Cole

On 2024-07-17 at 13:17:16 UTC-0400 (Wed, 17 Jul 2024 10:17:16 -0700)
Kirk Ismay 
is rumored to have said:


I have a spammer using a malformed From header, as follows:

From: sha...@marketcrank.com

The envelope from is: direcc...@delher.com.mx, and I've set up blocks 
for that address.


Sendmail is munging the From: header to change  to 
, so it ends up looking like a local address to my 
users.


How do I detect similar mangled From headers in Spamassassin?


I believe SA already has a more general rule that will catch the *BAD* 
form, but depending on how you've integrated SA and Sendmail, it may 
only see the "cleaned up" form that Sendmail provides. I believe SA sees 
the unmolested headers only in a milter interface, NOT if you've got it 
hooked into a mailer.


If not, here's a rule that should work:

header FROM_ANGLE_UNQUAL  From =~ /<[^<\@]*>[^\@]*\@/

Also does anyone know how to prevent Sendmail from rewriting the From 
header like this?  The documentation for confFROM_HEADER is a 
somewhat cryptic:


https://www.sendmail.org/~ca/email/doc8.12/cf/m4/tweaking_config.html#confFROM_HEADER

I'd rather it say  instead, or reject it 
entirely.


Thanks,
Kirk


Remove FEATURE(always_add_domain) from your .mc and remake sendmail.cf. 
Consult the Ops guide and/or cf/README for all of the effects of that.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: X-Amavis-Alert: BANNED, message contains x.com

2024-07-16 Thread Bill Cole

On 2024-07-16 at 11:55:50 UTC-0400 (Tue, 16 Jul 2024 17:55:50 +0200)
Benny Pedersen 
is rumored to have said:


Thomas Barth via users skrev den 2024-07-16 17:28:


X-Quarantine-ID: 
X-Amavis-Alert: BANNED, message contains x.com



Are there any further explanations for the banning of x.com?


ask on amavis maillist

are spamassassin using extractext ?

asking to be sure


That is NOT a SpamAssassin message, as SA does nothing so silly. It is 
clearly and strictly an Amavis issue.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Requesting help, sa-update, cron, gpg, unsafe ownership on homedir

2024-07-12 Thread Bill Cole

On 2024-07-12 at 10:51:08 UTC-0400 (Fri, 12 Jul 2024 10:51:08 -0400)
Steve Charmer 
is rumored to have said:


I have a cron job running as root, which calls sa-update

it warns about unsafe ownership


gpg: WARNING: unsafe ownership on homedir
`/var/lib/spamassassin/sa-update-keys'


Note that this is only a warning, not a failure.





this is my current ownership

ls -la /var/lib/spamassassin/sa-update-keys
total 16
drwx-- 2 spamd root  4096 Jun 20  2017 .
drwxr-xr-x 7 spamd spamd 4096 Nov 22  2018 ..
-rwx-- 1 spamd root  2783 Jun 20  2017 pubring.gpg
-rwx-- 1 spamd root 0 Jun 20  2017 pubring.gpg~
-rwx-- 1 spamd root 0 Jun 20  2017 secring.gpg
-rwx-- 1 spamd root  1200 Jun 20  2017 trustdb.gpg



I've read that the ownership should be root,


Would reading that advice again help you follow it? :)

Make the owner root.


so does having the owner =
spamd, and the group = root, causing that warning?


I'm betting yes, although I have not tested it. The definitive answer 
would come from looking at the gpg documentation, I expect.



I thought having group =
root would fix any ownership issues.


It will not, because gpg wants its keys to be owned by the user running 
gpg and no one else. it works with this setup because you're running as 
root, but it still knows that those keys belong to someone else.



I cannot recall now, why I set owner
to spamd. maybe spamd could not read the gpg keys when trying an 
update

before?


Why would a program run as root need that?

--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: whitelist_auth return_path / from

2024-07-03 Thread Bill Cole

On 2024-07-03 at 10:19:28 UTC-0400 (Thu, 04 Jul 2024 00:19:28 +1000)
Simon Wilson via users 
is rumored to have said:


On 03.07.24 23:54, Simon Wilson via users wrote:

Simon Wilson via users skrev den 2024-07-03 14:56:

Do I also need to disable the normal SA DKIM plugin evaluation, i.e.
trusting my upstream authres_trusted_authserv only?


both works in paralel, so no need to disable, best results came 
from 

both enabled

its up to you to add more authres_trusted_authserv or more 
authres_ignored_authserv lines

possible we can now have a very long debate on dmarc plugin ? :)


Please, Simon, quote the text you are replying to.
 
I have been - was that directed at Benny?
 


No, it is because your mail is multipart/alternative with a text/plain 
part that lacks any indicators of quoting. Looks like your MUA is 
broken.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: ChatGPT > Spamassassin? :)

2024-06-25 Thread Bill Cole

On 2024-06-25 at 17:38:28 UTC-0400 (Tue, 25 Jun 2024 17:38:28 -0400)
Mark London 
is rumored to have said:

Bill - Thanks for the response.  As an aside, it would be nice 
(though impossible?) for a spam filter to be more suspicious of emails 
coming from a new email address, that is not in my Sent folder or my 
Inbox. FWIW. - Mark


Matija's mention of AWL/TxRep is correct here. While some people find it 
a nuisance when it makes one FP into an ongoing series, I think it is 
worth enabling for most sites.


However, if you do enable either of those tools, you should have a 
mechanism for  feeding FPs into both a sitewide Bayes DB and into the 
AWL/TxRep DB by using the blocklist/welcomelist options of the 
spamassassin script.





On 6/25/2024 11:21 AM, Bill Cole wrote:

Mark London 
is rumored to have said:

I received a spam email with the text below, that wasn't caught by 
Spamassasin (at least mine).   The text actually looks like 
something that was generated using ChatGPT.  In any event,  I put 
the text through ChatGPT, and asked if it looked like spam.  At the 
bottom of this email , is it's analysis.  I've not been fully 
reading this group.  Has there been any work to allow Spamassassin 
to use AI?


"Artificial intelligence" does not exist. It is a misnomer.

Large language models like ChatGPT have a provenance problem. There's 
no way to know why exactly the model "says" anything. In a single 
paragraph, ChatGPT is capable of making completely and directly 
inconsistent assertions. The only way to explain that is that despite 
appearances, a request to answer the ham/spasm question generates 
text with no semantic connection to the original, but which seems 
like an explanation.


SpamAssassin's code and rules all come from ASF committers, and the 
scores are determined by examining the scan results from contributors 
and optimizing them to a threshold of 5.0. Every scan of a message 
results in a list of hits against documented rules. The results can 
be analyzed and understood.


We know that ChatGPT and other LLMs that are publicly available have 
been trained on data to which they had no license. There is no way to 
remove any particular ingested data. There's no way to know where any 
particular LLM will have problems and no way to fix those problems. 
This all puts them outside of the boundaries we have as an ASF 
project. However, we do have a plugin architecture, so it is possible 
for 3rd parties to create a plugin for LLM integration.






--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: ChatGPT > Spamassassin? :)

2024-06-25 Thread Bill Cole

On 2024-06-24 at 17:18:11 UTC-0400 (Mon, 24 Jun 2024 17:18:11 -0400)
Mark London 
is rumored to have said:

I received a spam email with the text below, that wasn't caught by 
Spamassasin (at least mine).   The text actually looks like something 
that was generated using ChatGPT.  In any event,  I put the text 
through ChatGPT, and asked if it looked like spam.  At the bottom of 
this email , is it's analysis.  I've not been fully reading this 
group.  Has there been any work to allow Spamassassin to use AI?


"Artificial intelligence" does not exist. It is a misnomer.

Large language models like ChatGPT have a provenance problem. There's no 
way to know why exactly the model "says" anything. In a single 
paragraph, ChatGPT is capable of making completely and directly 
inconsistent assertions. The only way to explain that is that despite 
appearances, a request to answer the ham/spasm question generates text 
with no semantic connection to the original, but which seems like an 
explanation.


SpamAssassin's code and rules all come from ASF committers, and the 
scores are determined by examining the scan results from contributors 
and optimizing them to a threshold of 5.0. Every scan of a message 
results in a list of hits against documented rules. The results can be 
analyzed and understood.


We know that ChatGPT and other LLMs that are publicly available have 
been trained on data to which they had no license. There is no way to 
remove any particular ingested data. There's no way to know where any 
particular LLM will have problems and no way to fix those problems. This 
all puts them outside of the boundaries we have as an ASF project. 
However, we do have a plugin architecture, so it is possible for 3rd 
parties to create a plugin for LLM integration.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Questions about spamassassin

2024-06-21 Thread Bill Cole
On 2024-06-20 at 19:17:19 UTC-0400 (Thu, 20 Jun 2024 18:17:19 -0500)
Paul Schmehl 
is rumored to have said:

> Here’s every line with bayes_ in it:
> bayes_#auto_learn 1
> bayes_learn_to_journal 1
> bayes_path /usr/local/etc/mail/spamassassin/bayes/bayes
> bayes_file_mode 0775
> bayes_ignore_header ReSent-Date
> bayes_ignore_header ReSent-From
> bayes_ignore_header ReSent-Message-ID
> bayes_ignore_header ReSent-Subject
> bayes_ignore_header ReSent-To
> bayes_ignore_header Resent-Date
> bayes_ignore_header Resent-From
> bayes_ignore_header Resent-Message-ID
> bayes_ignore_header Resent-Subject
> bayes_ignore_header Resent-To
>
> I think that first line looks problematic.

I agree. The spurious # would generate precisely the error message you got.

-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Questions about spamassassin

2024-06-20 Thread Bill Cole

On 2024-06-20 at 16:14:47 UTC-0400 (Thu, 20 Jun 2024 15:14:47 -0500)
Paul Schmehl 
is rumored to have said:

I’m running spamassassin (SA) 3.4, postfix 3.9.0-1, and dovecot 
2.2.36-8 on a linux server. I have some questions about SA that I 
can’t seem to find answers for on the web.


The SA conf files are /etc/mail/spamassassin. The bayes files are in 
/usr/local/etc/mail/spamassassin/bayes.


I’m running spamd as the content_filter in postfix. spamassassin 
unix -  n   n   -   -  pipe
user=spamd argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f 
${sender} ${recipient}


Everything is working as expected, but I have some questions about 
permissions. Should spamd be the owner of /etc/mail/spamassassin?


No. It is entirely normal for any user to read the config files. The 
spamd user never needs to write to that directory or anything in it.



Of /usr/local/etc/mail/spamassassin?


Yes. The bayes_* files there are the active Bayes DB in use by the spamd 
daemon, so the user the daemon is running as needs to be able to do 
anything in that directory.


Today I got a warning about the unsafe perms on sa-update-keys. Who 
should own those and what should the perms be?


Files in that directory control whose signatures you trust on daily 
rules packages, so the directory should be owned by root, perms 0700.




Finally, I’m seeing this in my maillogs.
config: failed to parse line, skipping, in 
"/etc/mail/spamassassin/local.cf": bayes_


This is the config in local.cf:
bayes_path /usr/local/etc/mail/spamassassin/bayes/bayes


Is there any other line in that file starting with 'bayes_' ?

That error message is not lying to you: you have an error in local.cf 
which SA cannot parse around. Also look in the lines before the 
'bayes_path' line for unterminated quotes.




This is the contents of the bayes folder:
# ls -lsah /usr/local/etc/mail/spamassassin/bayes/
total 632K
   0 drwxrwxr-x 2 spamd spamd   63 Jun 20 11:36 .
   0 drwxrwxr-x 3 spamd spamd   19 Jun 13 06:00 ..
 96K -rw--- 1 spamd spamd  95K Jun 20 14:44 bayes_journal
 12K -rwxrwxrwx 1 spamd spamd  12K Jun 20 11:32 bayes_seen
524K -rwxrwxrwx 1 spamd spamd 664K Jun 20 11:32 bayes_toks

spamd owns the directory /usr/local/etc/mail/spamassassin and all 
subdirectories. The perms are 775 for the directories and 777 for all 
files.  (I did this for testing purposes. They normally would be 755 
and 644.)


I hope there's only you on that machine...

Using 'chmod 777' to troubleshoot permissions issues is always a bad 
idea.


Spam that are not caught by SA are moved to my junk folder, and I 
croned a script that parses those and feeds them into bayes_seen. That 
script is working, and the bayes_seen file is being updated. (I 
checked the timestamp on the file after running the script manually.)


I can’t make sense out of this error message. What am I missing?


It is a configuration file parsing error. It has nothing to do with 
permissions or ownership. There's an error in local.cf.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Docs confusion and missing dependency on EL9

2024-06-19 Thread Bill Cole

On 2024-06-19 at 01:28:20 UTC-0400 (Wed, 19 Jun 2024 07:28:20 +0200)
Gerald Vogt 
is rumored to have said:


Hi,

for testing I tried to install spamassassin 4.0.1 on EL9 (AlmaLinux 
9.4). I have noticed some dependencies are not mentioned on the 
INSTALL page:


I have had to install perl-ExtUtils-MakeMaker.noarch to run 
Makefile.PL


That module has been a part of the Perl "core" in all versions of Perl 
5.



I have had to install perl-Archive-Tar.noarch to run sa-update.


Archive::Tar has been in the core since Perl v5.9.3


Those two are nowhere mentioned.


A standard Perl installation of any version we support will have both of 
those.


RedHat, for reasons of their own, splits the Perl core into many 
packages. To get the standard core on any EL-based system, install the 
"perl" package.



It also took me a while to find the instructions how to install.

I started at https://spamassassin.apache.org/index.html

where "Click here to get started using SpamAssassin! " looked 
promising.


But at

https://cwiki.apache.org/confluence/display/SPAMASSASSIN/StartUsing

I have spent considerable time to look for where to download and how 
to actually install spamassassin, but eventually gave up. Only now I 
have found some instructions on the SingleUserUnixInstall page.


So I have circled back and checked the Download link from the top. 
There I can download the tar, get hints on Upgrading but still nothing 
on installation.


There is a link at the top of the homepage to "Download" and in the 
tarball on the download page there's a document named "INSTALL"


The overwhelming majority of users who install SA do so using their 
system's packaged version or CPAN.




The Wiki and FAQ links from the top are not helpful either.

So eventually, I have found it on "Docs", pointing to the INSTALL 
file.


From experience, that it not really the first place I would look.


That certainly varies by individual. I definitely look to the 
documentation for information on how to install software.


I would think the "Get Started" page should have a link to the 
Download and INSTALL page at the beginning. Downloading and installing 
seem to be the obvious first steps to get started.


I agree. The whole logical structure of the website needs a more 
rigorous review.



The Download page should have a link for INSTALL like it already has 
for the Upgrade.


And I would say "Where to download" and "How to install" are pretty 
common FAQs, too.


Indeed.

--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: BayesStore MariaDB on EL9

2024-06-18 Thread Bill Cole

On 2024-06-18 at 14:58:15 UTC-0400 (Tue, 18 Jun 2024 20:58:15 +0200)
Gerald Vogt 
is rumored to have said:


Hi,

for a test, I have increased the column length of token to binary(32) 
and used a test file to import containing a single token.


This time it went through. However, as I suspected, the token length 
is not 5 byte. Token line from backup:


t   1   0   1718024618  027121926a

Hex representation of content in database:

MariaDB [spamassassin]> select hex(token) from bayes_token\G
*** 1. row ***
hex(token): 
027121C2926A

1 row in set (0.000 sec)

Compared:

Original 02 71 2192 6a
Database 02 71 21 C2 92 6A

C2 92 is the UTF-8 encoding of U+0092, thus basically the token is 
written in UTF-8 into the database.


That's odd... What is the character set of the database?

Running sa-learn with DBI_TRACE=2 I can also see that it looks like it 
actually has the UTF-8 encoding already in there during parameter 
binding:


Binding parameters: INSERT INTO bayes_token
   (id, token, spam_count, ham_count, atime)
   VALUES ('43','^Bq!j','1','0','1718024618')
   ON DUPLICATE KEY UPDATE spam_count = 
GREATEST(spam_count + '1', 0),
   ham_count = GREATEST(ham_count 
+ '0', 0),
   atime = GREATEST(atime, 
'1718024618')


Thus, I would say it's not an issue with the database.

Any idea?

Running spamassassin-3.4.6-5.el9.x86_64 on AlmaLinux 9.4.


First: upgrade to 4.0.1

There were substantial changes in how encoding was handled between 3.4.6 
and 4.0, and there is a substantial likelihood that any problem with 
encoding would not occur in 4.0 or later.


I don't know exactly what the cause of the problem is (i.e. why is SA 
trying to write UTF-8 to the database?) but I'm quite sure that an 
official fix for 3.4.x will never happen.






Thanks,

Gerald

On 18.06.24 17:09, Gerald Vogt wrote:

Hi!

I am trying to use a mariadb database as bayesstore, but it fails to 
load tokens. Whenever it tries to insert something into bayes_token 
it fails with an error


dbg: bayes: _put_token: SQL error: Data too long for column 'token' 
at row 1


The table has been created as mentioned in

https://github.com/apache/spamassassin/blob/trunk/sql/bayes_mysql.sql

but the 5 byte binary isn't big enough. I have tried with sa-learn 
--restore as well as learning some spam mails. bayes_token remains 
empty.


MariaDB [spamassassin]> show create table bayes_token\G
*** 1. row ***
    Table: bayes_token
Create Table: CREATE TABLE `bayes_token` (
   `id` int(11) NOT NULL DEFAULT 0,
   `token` binary(5) NOT NULL,
   `spam_count` int(11) NOT NULL DEFAULT 0,
   `ham_count` int(11) NOT NULL DEFAULT 0,
   `atime` int(11) NOT NULL DEFAULT 0,
   PRIMARY KEY (`id`,`token`),
   KEY `bayes_token_idx1` (`id`,`atime`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci
1 row in set (0.000 sec)

Any idea what goes wrong here?

Thanks,

Gerald





--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Where are your test definitions?

2024-06-14 Thread Bill Cole

On 2024-06-14 at 17:33:22 UTC-0400 (Fri, 14 Jun 2024 23:33:22 +0200)
Thomas Barth via users 
is rumored to have said:


Am 2024-06-14 21:20, schrieb Matus UHLAR - fantomas:

grep -ri "FONT_INVIS_NORDNS" /var/lib/spamassassin/ | grep describe
/var/lib/spamassassin/4.00/updates_spamassassin_org/72_active.cf: 
describe FONT_INVIS_NORDNS Invisible text + no rDNS


In my case, I can say with certainty that the mail comes from a 
business partner of a colleague :-)


If you want to find out more, feed the mail to "spamassassin -D" and 
that should explain which text matched which rules.


and as we told you already, your client should NOT play with small or 
semi-invisible text in mail. That's what spamers do.


Cool, but now I ve more questions! :-)

When the eMail arrived the score was 6.248. I repeat the testlist:

BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
 DKIM_VALID_EF=-0.1, DMARC_PASS=-0.001, FONT_INVIS_MSGID=2.497,
 FONT_INVIS_NORDNS=1.544, HTML_FONT_TINY_NORDNS=1.514, 
HTML_MESSAGE=0.001,
 RDNS_NONE=0.793, RELAYCOUNTRY_BAD=2, SPF_HELO_NONE=0.001, 
SPF_PASS=-0.001,

 T_KAM_HTML_FONT_INVALID=0.01, T_SCC_BODY_TEXT_LINE=-0.01

But when piping the eMail to spamassassin -D the score is 10.5! And 
RDNS_NONE gets a 1.3!


It is very likely (almost certain...) that your shell account and your 
mail server have different SpamAssassin configurations. Per-user 
configurations are in ~/.spamassassin/user_prefs by default, while the 
settings used by SpamAssassin via whatever glue you are using to hook 
into your MTA really depends on how you do that. Per-user prefs can 
change scores or even scoresets (i.e. using net and bayes or not) so you 
need to figure out which prefs each checking method is using.


A single user also stands a strong chance of not having enough data 
learned into their own Bayes DB for it to be used, while a system-wide 
DB usually will. The above list has a (favorable) BAYES score, the one 
below has none




 2.5 URIBL_DBL_SPAM Contains a spam URL listed in the Spamhaus 
DBL

blocklist
[URI: www.example.com]
[URI: example.com]


That's a rule that is likely to hit on "aged" spam that it did not hit 
earlier, because it can take time for Spamhaus to list spammers like 
example.com... ( I assume you've redacted to protect the definitely 
guilty.)




 0.0 SPF_HELO_NONE  SPF: HELO does not publish an SPF Record
 0.1 DKIM_SIGNEDMessage has a DKIM or DK signature, not 
necessarily valid
 0.1 DKIM_INVALID   DKIM or DK signature exists, but is not 
valid
 2.0 RELAYCOUNTRY_BAD   Relayed through spammy country at some 
point

 0.0 HTML_MESSAGE   BODY: Nachricht enthlt HTML
-0.0 T_SCC_BODY_TEXT_LINE   No description available.
 1.2 FONT_INVIS_NORDNS  Invisible text + no rDNS
 1.3 RDNS_NONE  Delivered to internal network by a host 
with no rDNS
 0.0 T_KAM_HTML_FONT_INVALID Test for Invalidly Named or Formatted 
Colors

in HTML
 2.5 FONT_INVIS_MSGID   Invisible text + suspicious message ID
 0.0 HTML_FONT_TINY_NORDNS  Font too small to read, no rDNS
 0.9 DMARC_NONE DMARC none policy

Let's just assume that the colleague is corresponding with a spammer


OR: discussing a spammer, with domain names.

and the colleague knows nothing about it. I'm just interested to know 
why the score is lower when the last mail arrived than in the current 
test. Is it because a few hours have already passed and the mail is 
rated differently in the DNS blocklists?


That's the URIBL_DBL_SPAM hit.


Or could it be that something is still wrong with my configuration?


"Wrong" is such a judgy word...
You have variances. Your MTA checks in one way, your shell checks in 
another.


However, I can see in the journal that every mail is checked against 
blocklists, may be not completly? This difference is now irritating 
me.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Where are your test definitions?

2024-06-14 Thread Bill Cole

On 2024-06-14 at 10:39:36 UTC-0400 (Fri, 14 Jun 2024 16:39:36 +0200)
Thomas Barth via users 
is rumored to have said:


Hello,

I would like to explain a sender what he can do to create an email 
that is not classified as spam.


X-Spam-Status: Yes, score=6.248 tagged_above=1 required=5
 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, 
DKIM_VALID_AU=-0.1,

 DKIM_VALID_EF=-0.1, DMARC_PASS=-0.001, FONT_INVIS_MSGID=2.497,
 FONT_INVIS_NORDNS=1.544, HTML_FONT_TINY_NORDNS=1.514, 
HTML_MESSAGE=0.001,
 RDNS_NONE=0.793, RELAYCOUNTRY_BAD=2, SPF_HELO_NONE=0.001, 
SPF_PASS=-0.001,

 T_KAM_HTML_FONT_INVALID=0.01, T_SCC_BODY_TEXT_LINE=-0.01]

I cannot find the definitions on your old site 
https://spamassassin.apache.org/old/tests_3_1_x.html.

FONT_INVIS_NORDNS, FONT_INVIS_MSGID, HTML_FONT_TINY_NORDNS, RDNS_NONE

Is there no current version of the test definition.



The rules get tested, rescored, and assembled into a release package 
daily so it is not really feasible to put a set of static pages up with 
all the descriptions  of all active rules, as the set changes daily.


You can either use sa-update to get the current ruleset and find the 
rule descriptions in that package or go through the current files in the 
repo: https://svn.apache.org/viewvc/spamassassin/trunk/rules/ and 
https://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Warning: Your Pyzor may be broken.

2024-06-08 Thread Bill Cole
On 2024-06-08 at 15:35:01 UTC-0400 (Sat, 08 Jun 2024 21:35:01 +0200)
Benny Pedersen 
is rumored to have said:

> Bill Cole skrev den 2024-06-08 20:45:
>
>> I've chosen #3 for myself, but it's not great.
>
> is why cpanel provided a perl pyzor client ?

I had forgotten about that. Thank you, Benny.

Using pyzor_perl=1 and pyzor_server_file is absolutely the best option, 
assuming that it works.

> ifplugin Mail::SpamAssassin::Plugin::Pyzor
>
> use_pyzor 1
> pyzor_count_min 1
> pyzor_welcomelist_min 1
> pyzor_welcomelist_factor 0.2
> pyzor_fork 0
> pyzor_perl 1
> pyzor_timeout 120
>
> # pyzor_options options
> # pyzor_path STRING
>
> # pyzor_server_file FILE
> pyzor_server_file /etc/mail/spamassassin/pyzor_server_file.conf
>
> # Pyzor servers configuration file path, used by Pyzor Perl 
> implementation.
> # By default Pyzor will connect to public.pyzor.org on port 24441.
>
> endif # Mail::SpamAssassin::Plugin::Pyzor
>
> i juat got no hits yet


-- 
Bill Cole


Warning: Your Pyzor may be broken.

2024-06-08 Thread Bill Cole
I was working on a mail system today and inadvertently noticed that its Pyzor 
was broken. When I tried to reinstall Pyzor according to the web documentation 
with "pip3 install pyzor" I got what claimed to be v1.0.0 and no complaints 
from the installer but when running the pyzor client tool, it kicked out errors 
indicating to me that the program had not been even trivially updated to work 
with Python 3. I did the absolute hackiest thing I could to make it work 
(blanket s/iteritems/items/ and s/xrange/range/ to address specific error 
messages) and it did so, but that's not acceptable. Neither is reinstalling a 
Python2 world.

I went looking for a better fix and found a reported issue at 
https://github.com/SpamExperts/pyzor/issues/155 matching my original symptoms 
in which a workaround was provided: install directly from the GitHub project's 
master.zip link, i.e. a snapshot assembled from the current state of the repo, 
which claims to be v1.1.1. I do not like that solution at all, and added a 
comment to that issue suggesting that they fix the problem by cutting a release 
for PyPI. No response yet, but it has only been a matter of minutes.

FOR NOW: If you are running a system where Python 2.x no longer exists (that 
should be everywhere...) and you've never confirmed that Pyzor is working for 
you, do so now. If you pipe a message to 'pyzor check' and it gives you a 
response like this you're fine:

   public.pyzor.org:24441   (200, 'OK') 0   0

If instead you get a Python stack trace, obviously it's broken.

I don't feel great recommending any of the obvious mitigations. They are:

1. Install Python 2.7 and pyzor 1.0.0 from PyPI.
2. Hand-patch pyzor 1.0.0 minimally to get it to work with Python 3.
3. Install the head of the development tree from GitHub, whatever that happens 
to be at the moment.

I've chosen #3 for myself, but it's not great.


-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: [mailop] SORBS Closing.

2024-06-07 Thread Bill Cole
On 2024-06-06 at 19:53:02 UTC-0400 (Thu, 6 Jun 2024 19:53:02 -0400)
J Doe 
is rumored to have said:

[...]

> Hi Rob and list,
>
> Speaking as a small user of SORBS via SpamAssassin 4.0, I assume the
> correct response to disable use of SORBS is to place the following in my
> local.cf file:
>
> dns_query_restriction deny sorbs.net
>
> Is that correct and is there any additional portions of local.cf I need
> to configure so that I am no longer consulting SORBS ?

You do not even need to do that.

All SORBS-referencing rules were removed from the updates.spamasssassin.org 
rules channel earlier this week. Scanning the latest deployed (by sa-update) 
version r1918114 I see no surviving references to SORBS.



-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: RCVD_IN_RP_CERTIFIED always -3

2024-06-07 Thread Bill Cole
On 2024-06-06 at 12:08:54 UTC-0400 (Thu, 6 Jun 2024 18:08:54 +0200)
 
is rumored to have said:

> Thanks for your answer Harald.
>
> Regarding "there is no such configuration option in SpamAssassin":  The conf 
> snipplet I posted below comes from the repository, however it's an older 
> version, which still is supported by Ubuntu 20.04.06 LTS and can be installed 
> from their related archive (at least my rules where last updated in March 23).
> https://github.com/apache/spamassassin/blob/spamassassin_release_3_4_4/trunk-only/rules/20_dnsbl_tests.cf
>  (the same is used up to 3.4.6)

Note that the Github repository is a courtesy replica for people who don't want 
to learn Subversion, and it is NOT authoritative. We do not support using 
Github to install SpamAssassin in any way. You can try it but you're on your 
own.

As for grabbing rules from ancient history in Github, that is just a recipe for 
disaster. The rules are updated daily and packaged for distribution directly 
from the ASF and our SA-only mirrors using sa-update. Rules change for many 
different reasons, including changes in how 3rd-party data providers like 
Validity (formerly ReturnPath) operate.

> I should have written I'm on an older Ubuntu, might have helped to avoid 
> confusion.

If Ubuntu told you to update rules from Github, you should consider a better 
distro...

(I strongly doubt that they did...)

> Regarding the SpamAssassin 4.x rules - are they backward compatible to 3.4.4?

Yes.

As well-documented in the SpamAssassin documentation, the correct way to keep 
your rules and their scores up-to-date is to run the sa-update tool daily. It 
is part of the distribution. Rules in the standard "updates.spamassassin.org" 
channel are maintained to be backwards compatible, with rules that use newer 
features being tested for availability before load.

HOWEVER: Running 3.4.4 is a bad idea. Unless it has extensive backports of 
patches from more modern versions, it is going to miss a lot of spam and run 
very inefficiently. This is especially true if you use rulesets from that era, 
which have known (and fixed in trunk) runaway problems and obsolete DNSBL 
configs.

There may also be a problem running sa-update from 3.4.4 because we have 
abandoned SHA1 signatures. I'm not sure if 3.4.4 included the changes that 
switch to more secure hashes.

-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: DKIM length 'l=' tag

2024-06-03 Thread Bill Cole
On 2024-06-03 at 07:05:29 UTC-0400 (Mon, 3 Jun 2024 12:05:29 +0100 
(BST))

Andrew C Aitchison 
is rumored to have said:


The DKIM RFC
   https://datatracker.ietf.org/doc/html/rfc6376#section-8.2
tells us that it is not safe to rely on the DKIM length (l=) tag


Never has been safe. Terrible idea from the start. Never should have 
been included in the specification.



and
   https://www.zone.eu/blog/2024/05/17/bimi-and-dmarc-cant-save-you/
shows how it can be used to subvert BIMI*.


I can't honestly say that I care. BIMI is a misguided concept useful 
only to marketers and the mythological creatures they call "consumers" 
who behave unlike many real humans.


I am looking at extending Mail::SpamAssassin::Plugin::DKIM to indicate 
when a DKIM body signature only covers part of the message body
and how much of the body is unsigned (bytes, percentage or possibly 
both).


I was thinking of the same thing in a half-assed way, just catching 
anything using the length tag. I'd bet that correlates to spam but we'd 
need data to prove that.


I am new to the spamassassin code, so any comments or suggetions would 
be welcome.


Resist the urge to refactor. It's easy to break things.


* I am not a fan of BIMI, but big name players appear to be using
it to display "trustable" logos on GUI mail clients, so users *will*
be caught when it breaks.


The concept that users should learn to trust logos as authentication per 
se is harmful. BIMI should be broken now and with every opportunity 
available. It is an indicator that a MUA author puts the interests of 
marketers ahead of the interests of users.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Lots of FN because of VALIDITY* rules

2024-06-03 Thread Bill Cole

On 2024-06-03 at 08:35:32 UTC-0400 (Mon, 3 Jun 2024 14:35:32 +0200)
postgarage Graz IT 
is rumored to have said:

I think that the active.list file should be updated, when there are 
new rules, shouldn't it?


It is updated where it is actually used, on the ASF rule maintenance 
system. It is irrelevant to an operational deployment.


I have no idea why Debian installs that file at all.

--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Lots of FN because of VALIDITY* rules

2024-06-03 Thread Bill Cole

On 2024-06-03 at 01:26:31 UTC-0400 (Mon, 3 Jun 2024 07:26:31 +0200)
postgarage Graz IT 
is rumored to have said:


Now for my questions:
*) as is stated in active.list it should not be edited. What's the 
correct place to add the new rules to activate them? local.cf?


Yes. In your local version of local.cf, typically in 
/etc/mail/spamassassin. This is as documented. Run "perldoc 
Mail::SpamAssassin::Conf" for the core configuration documentation.


Note that active.list is part of the rule management toolkit and IS NOT 
part of normal operations. It is part of the ruleset-building system 
that we use to create the daily update packages. In theory anyone could 
use that system to maintain rules and scores locally based on local 
data, as we do to produce daily updates, but I do not believe that 
anyone (literally *anyone*) other than the ASF does that.



*) If I understand it correctly
/var/lib/spamassassin/4.00/updates_spamassassin_org/ is updated by 
the SA update mechanism but it's the Linux distribution's 
responsibility to update /var/lib/spamassassin?


I'm not 100% clear on what that question means, perhaps because Debian 
does something different in /var/lib/spamassassin. The standard 
sa-update program will create versioned subdirectories under 
/var/lib/spamassassin/ as needed and channel subdirectories inside of 
them.



In that case should I fill a Debian bug? Or should the SA updates also 
include the file active.list?


SA updates include the active rules list in the form of the 72.active.cf 
file. The active.list file is not part of normal operations.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: TxRep does not read setting|default value

2024-05-30 Thread Bill Cole
On 2024-05-30 at 03:58:18 UTC-0400 (Thu, 30 May 2024 16:58:18 +0900)
Tomohiro Hosaka 
is rumored to have said:

> Hello.
>
> The code seems to be wrong.

I do not believe that to be so. See lines 340-347 in TxRep.pm.


-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: "deadline shrunk" in logs ?

2024-05-27 Thread Bill Cole
On 2024-05-27 at 17:43:43 UTC-0400 (Mon, 27 May 2024 17:43:43 -0400)
J Doe 
is rumored to have said:

> Hi list,
>
> Sometimes when I am checking my e-mail server logs, SA will note
> "deadline shrunk":
>
> May 27 12:56:07 server spamd[29305]: async: aborting after 4.253 s,
> deadline shrunk: DNSBL, A/106.55.47.104.dnsbl.sorbs.net, rules:
> RCVD_IN_SORBS_DUL, __RCVD_IN_SORBS
>
> What does the expression "deadline shrunk" mean ?


It means that for some reason, the abort_remaining_lookups() function was 
called before all pending DNS queries were complete and before the fixed 
timeout deadline was reached. The most common cause is a DNS-based rule 
configured to shortcircuit while other queries are outstanding.

-- 
Bill Cole


Re: Extract Local-part from To: Adress to use in spamassassin rule

2024-05-23 Thread Bill Cole

On 2024-05-23 at 03:40:48 UTC-0400 (Thu, 23 May 2024 09:40:48 +0200)
Carsten 
is rumored to have said:


Hi @all,

I want to create a SpamAssassin rule that checks if the subject line 
of an email contains the local part of the recipient's email address 
(the part before the @ symbol). For example, if the recipient's email 
address is |i...@example.com|, I want to check if the subject contains 
the phrase "info lorem ipsum". If the recipient's email address is 
|foo...@example.com|, I want to check if the subject contains the 
phrase "foobar lorem ipsum". The rule should be general and adaptable 
to different local parts of email addresses.


*Requirements:*

1. Extract the local part of the recipient's email address from the
   |To| header.
2. Use the extracted local part to check if it is present in the
   |Subject| header.
3. The rule should be written in a way that works for any local part 
of

   the email address, not just a specific one.


See the section titled "CAPTURING TAGS USING REGEX NAMED CAPTURE GROUPS" 
in the embedded configuration documentation (perldoc 
Mail::SpamAssassin::Conf) for how to capture a pattern in one rule and 
use it in another. I don't have a working rule for you, but that's the 
mechanism I would use.




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: double backslash in the log messages

2024-05-21 Thread Bill Cole

On 2024-05-21 at 11:00:57 UTC-0400 (Tue, 21 May 2024 17:00:57 +0200)
Vincent Lefevre 
is rumored to have said:


While testing a rule with SpamAssassin 4.0.0 under Debian/stable
(I wasn't aware of allow_user_rules yet, but this is not the issue
I'm reported):

2024-05-21T16:42:42.792136+02:00 joooj spamd[219339]: config: not 
parsing, 'allow_user_rules' is 0: header LOCAL_TO_LORIA ToCc =~ 
/loria\\.fr/i
2024-05-21T16:42:42.793753+02:00 joooj spamd[219339]: config: failed 
to parse line in /srv/d_joooj/home/vinc17/.spamassassin/user_prefs 
(line 192): header LOCAL_TO_LORIA ToCc =~ /loria\\.fr/i


while I just had /loria\.fr/i (with a single backslash) in my
user_prefs config file.

Is there a reason to have a double backslash in the log messages
or is this a bug?


It is intentional to assure that log messages (which may include strings 
from tainted sources) have all common meta-characters escaped.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Difference between spamc -L and sa-learn

2024-05-18 Thread Bill Cole

On 2024-05-18 at 10:26:54 UTC-0400 (Sat, 18 May 2024 16:26:54 +0200)
Francis Augusto Medeiros-Logeay 
is rumored to have said:


Hi,

Is there any difference between using spamc -L and sa-learn ?


Yes. The compiled-C spamc binary loads no Perl, it just talks over a 
socket to spamd, which is always running and so always has the advantage 
of a warmed-up i/o cache and a permanently loaded set of Perl code 
objects pre-compiled and in RAM; sa-learn has to open and compile all of 
the needed SA Perl code on every launch.



I noticed that the later is way slower.


Yes, it is. It is quite expensive to execute perl and have it load the 
many SpamAssassin modules needed to learn a message.




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Error parsing sql configuration

2024-05-18 Thread Bill Cole

On 2024-05-18 at 10:25:28 UTC-0400 (Sat, 18 May 2024 16:25:28 +0200)
Francis Augusto Medeiros-Logeay 
is rumored to have said:


Hi,

I use Spamassassin 4 on Ubuntu 24.04.

I have configured SQL for storing user preferences. Things work fine, 
but I am getting these errors on my logs:


Sat May 18 16:22:21 2024 [75733] info: config: not parsing, 
administrator setting: use_pyzor\t1
Sat May 18 16:22:21 2024 [75733] info: config: failed to parse line in 
(sql config) (line 23): use_pyzor\t1
Sat May 18 16:22:21 2024 [75733] info: config: not parsing, 
administrator setting: use_razor2\t1
Sat May 18 16:22:21 2024 [75733] info: config: failed to parse line in 
(sql config) (line 28): use_razor2\t1


My query is pretty standard:

user_scores_sql_custom_query SELECT preference,value FROM 
spam_assassin_userpref WHERE username = _USERNAME_ OR username = 
'$GLOBAL' OR username = CONCAT('%',_DOMAIN_) ORDER BY username ASC


Is there a bug when parsing the preferences from sql?


It's not really a parsing error, it's a configuration error. You cannot 
set "use_pyzor" or "use_razor" in user preferences, as they are both 
restricted to system-wide config.





--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: SA treats percentage spaces wording as uri

2024-05-14 Thread Bill Cole

On 2024-05-13 at 20:09:33 UTC-0400 (Tue, 14 May 2024 10:09:33 +1000)
Noel Butler 
is rumored to have said:

This morning one of our ent_domains DMARC weekly report from a third 
party was listed as spam by SA which took the wording  
Not_percent-twenty_Resolved and passed it off to URI checks adding 
dot.com to it when there is no dot com after it, and a raw message 
search of that message in less in console confirms it.


Context is important. If SA is mis-parsing a message, we really need to 
see the message to understand why. There's nothing obviously magic about 
that string.



Problem with the code that scans the content for things like URI's?


Likely.

That code is intentionally loose. It is intended to turn anything that 
any MUA might consider a clickable link into the same functional URI 
that a MUA would. This creates a fundamental tension between 
completeness and correctness. SA leans towards completeness but if it is 
doing something harmful we'd like to fix that. It would be particularly 
important to fix it if the result was a hit on a substantial rule, but 
it is not as important to avoid checking bogus URIs that will never hit 
anything anyway.



It shouldn't be assuming there's a TLD after it.


I agree. That's a step too far. The days when appending .com was a 
reasonable tactic for qualifying hostnames are long gone.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: dkim https://16years.secvuln.info/

2024-05-13 Thread Bill Cole

On 2024-05-13 at 08:09:04 UTC-0400 (Mon, 13 May 2024 14:09:04 +0200)
Benny Pedersen 
is rumored to have said:

i write here so in hope to start a debate on it, is there a code 
change any where to handle this ?


That's not a SA issue. Nothing SA does can fix it

The change (in Debian) that fixed that vulnerability was released 16 
years ago. It is up to sysadmins to pay attention and deploy fixes when 
they are available.  If people are still using bad keys generated 16 
years ago, they are failing to do that. We can't fix it.


The problem being cited in 2024 is 16 years of incompetent system 
administration, not bad code or distribution config.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Score 0.001

2024-05-11 Thread Bill Cole

On 2024-05-11 at 14:26:59 UTC-0400 (Sat, 11 May 2024 20:26:59 +0200)
Thomas Barth 
is rumored to have said:


Hello

Am 2024-05-11 19:24, schrieb Loren Wilton:

Can I just take the names of the rules?

e.g. at least two checks should fire:

meta MULTIPLE_TESTS (( RAZOR2_CF_RANGE_51_100 + RAZOR2_CHECK + 
URIBL_ABUSE_SURBL) > 1)

score MULTIPLE_TESTS 1

found in

X-Spam-Status: No, score=5.908 tagged_above=2 required=6.31
tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
DKIM_VALID_EF=-0.1, DMARC_PASS=-0.001, FSL_BULK_SIG=0.001,
HTML_MESSAGE=0.001, RAZOR2_CF_RANGE_51_100=2.43, 
RAZOR2_CHECK=1.729,

SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_ABUSE_SURBL=1.948]


Why is your score threshold for spam 6.31? By default it is 5, and 
that message would have been spam.


6.31 has been the default value on a Debian system for ages and is 
based on the experience of the “spam analysts”. That's how I 
remember it. I have therefore retained this value. Who introduced the 
default value of 5? Spamassassin itself, because spam is getting 
better and better and fewer rules apply?


5.0 has been the default threshold in the distribution forever and that 
value is an assumption in the dynamic scoring and RuleQA service which 
adjusts scores to their optimal values daily based on the latest results 
submitted by masscheck contributors.


I have no idea who the Debian "spam analysts" are but I am certain that 
they are not doing any sort of data-driven dynamic adjustments of scores 
based on a threshold of 6.3 nor are they (obviously) adjusting that 
threshold daily based on current scores. The only reason I can see for 
boosting the threshold is if there is an additional set of rules being 
used with a significant number of the non-standard low-S/O rules. For 
example, if you use KAM rules (which are not part of the RuleQA process) 
you will have a lot of rule hits on legit mail and you can either boost 
the threshold or do a lot of local-specific FP mitigation.


On systems I manage I mostly use a *lower* threshold, because I apply 
more active site-specific rule management (and FP avoidance) than most 
systems ever receive.




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Score 0.001

2024-05-10 Thread Bill Cole
On 2024-05-10 at 14:15:56 UTC-0400 (Fri, 10 May 2024 14:15:56 -0400)
Bill Cole 
is rumored to have said:

> On 2024-05-09 at 18:19:14 UTC-0400 (Thu, 9 May 2024 15:19:14 -0700)
> jdow 
> is rumored to have said:
>
>> On 20240509 15:05:46, Thomas Barth wrote:
>>> Am 2024-05-09 21:41, schrieb Loren Wilton:
>>>> Low-score tests are neither spam nor ham signs by themselves. They can be 
>>>> used in metas in conjunction with other indicators to help determine ham 
>>>> or spam. A zero value indicates that a rule didn't hit and the sign is not 
>>>> present. A small score indicates that the rule did hit, so the sign it is 
>>>> detecting is present.
>>>
>>> 0.001 seems to be the default lowest value. Is it possible to change it to 
>>> 0.01 or 0.1?
>
> Sure. It's just a number.

Clarifying; You can change any score yourself on your own system locally if you 
like, but to make no rule ever score 0.001 you'd need to fix the scores for all 
low-score rules every time that you run sa-update. As John Hardin says, we will 
not be changing the default to 0.1 in the rules distribution; that would be too 
significant a value. I also think that there is value in having matched rules 
showing up in the long form (folded header) of the SA report with "0.0" if they 
are intended to have no direct impact on the ham/spam decision.


-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Score 0.001

2024-05-10 Thread Bill Cole
On 2024-05-10 at 11:00:45 UTC-0400 (Fri, 10 May 2024 08:00:45 -0700 (PDT))
John Hardin 
is rumored to have said:

> Note that poorly-performing rules may get a score that looks informational, 
> but that may change over time based on the corpora.

IOW: rules that in themselves are not good enough performers to get included in 
the daily active list will still be pulled into the active list with a trivial 
score if derivative meta rules which are good enough for real scores depend on 
them.

-- 
Bill Cole


Re: Score 0.001

2024-05-10 Thread Bill Cole
On 2024-05-09 at 18:19:14 UTC-0400 (Thu, 9 May 2024 15:19:14 -0700)
jdow 
is rumored to have said:

> On 20240509 15:05:46, Thomas Barth wrote:
>> Am 2024-05-09 21:41, schrieb Loren Wilton:
>>> Low-score tests are neither spam nor ham signs by themselves. They can be 
>>> used in metas in conjunction with other indicators to help determine ham or 
>>> spam. A zero value indicates that a rule didn't hit and the sign is not 
>>> present. A small score indicates that the rule did hit, so the sign it is 
>>> detecting is present.
>>
>> 0.001 seems to be the default lowest value. Is it possible to change it to 
>> 0.01 or 0.1?

Sure. It's just a number.

> 1) This cyberunit is unwarrantedly curious, why does this matter to you?
>
> 2) Probably not as  it may be related to how perl handles numbers.

Not so much. SA has no need for high-precision floating-point math so there is 
nothing special about 0.001 or 0.0001 or any other small number.

The reason for such low scores is to assure that the rule is checked, even if 
no other rule depends on it. Such rules usually are a component in multiple 
other meta rules that have more significant scores, but are not significantly 
spam or ham signs on their own.

-- 
Bill Cole


Re: Rule: "1.0 R_DCD 90% of .com. is spam"

2024-05-10 Thread Bill Cole
On 2024-05-10 at 11:08:53 UTC-0400 (Fri, 10 May 2024 15:08:53 +)
Rupert Gallagher 
is rumored to have said:

> R_DCD

That string does not occur anywhere in the SpamAssassin distribution, neither 
in the code nor in the rules, *including* the rules that are not currently 
performing well enough to in the active list.

If your system generated that hit, it is one of your own local rules. If it 
came from elsewhere, ask them.



-- 
Bill Cole


Re: Whitelist rules should never pass on SPF fail

2024-05-10 Thread Bill Cole
On 2024-05-09 at 17:21:07 UTC-0400 (Fri, 10 May 2024 07:21:07 +1000)
Noel Butler 
is rumored to have said:

> So what? domain owners state hard fail it SHOULD be hard failed, irrespective 
> of if YOU think you know better than THEM or not, if we hardfail we accept 
> the risks that come with it.

In principle, that is fine (as a demonstration of why some principles are 
pointless and do more harm than good...)

In practice, there is a prioritizing of whose wishes I prioritize on the 
receiving systems I work with. If my customer wants to receive the mail and the 
individual generating the mail is not generating that desire fraudulently, I 
don't care much about what the domain owner says. I do not work for the domain 
owners of the world and I am not obligated to enforce their usage rules on 
their users. Obviously I take their input seriously when trying to detect fraud 
but I've seen too many cases of "-all" being used with incomplete or obsolete 
lists of "permitted" hosts to accept that they know all of the places their 
mail gets generated.

I've also given up all hope of getting the few places that are still doing 
transparent forwarding to adopt SRS or any other mechanisms to avoid SPF 
breakage to ever change. There is no ROI in trying to fix such cases 
individually but users still want their college email addresses to work decades 
after graduating and some colleges have pandered to them. So have some 
professional orgs.


-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Using -t to test rule changes

2024-05-09 Thread Bill Cole

On 2024-05-08 at 19:18:28 UTC-0400 (Wed, 8 May 2024 19:18:28 -0400)
Alex 
is rumored to have said:

Hi, I'm using the latest version of SA from trunk (although I don't 
think

that matters) and trying to make adjustments to rules on a particular
false-positive email that was quarantined by amavis so I can adjust 
the

rules to prevent it from being quarantined.

The problem is that amavis manipulates the headers to prevent me from 
being

able to process them with spamassassin -t again.

I've tried using -d to remove the previous reports first, adding the
envelope-from and return-path but SPF fails, of course, and it also 
prints

twice the triggered rules, one set after the other.

What can be done to be able to process a quarantined email again so I 
can

make adjustments to prevent it from being quarantined?


I do not know a specific fix, as I don't use Amavis.
You need to figure out what Amavis does to mail BEFORE doing a 
SpamAssassin check (e.g. add a placeholder Received header) and what it 
does AFTER the check (encapsulate? prefix? mangle? I do not know...) 
when it quarantines the message.   To recover the original, you need to 
undo the quarantine changes and redo the pre-check prep. It may be 
relevant what you have set report_type to in your local config.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Whitelist rules should never pass on SPF fail

2024-05-09 Thread Bill Cole

On 2024-05-09 at 08:37:06 UTC-0400 (Thu, 09 May 2024 14:37:06 +0200)
Benny Pedersen 
is rumored to have said:


Bill Cole skrev den 2024-05-09 14:22:

In fact, I can't think of any whitelist test that should pass if SPF 
fails.


If you operate on the theory that a SPF failure is always a sign of 
spam, you can make your SpamAssassin always trust SPF failures 
absolutely. I would not recommend that. Some people screw up their 
SPF records. Other people forward mail transparently, which reliably 
breaks SPF. SPF is broken *by design* as a spam control tool AND as a 
mail authentication tool. We knew this 20 years ago, but it remains a 
useful tool if you work with its limits rather than assuming that 
they do not exist.


spf domain owner asked for hardfails, so why not score spf_fail as 100 
? :)


I believe that has been covered in extreme detail and redundancy here 
and in other email-related fora MANY times over the past 20 years.


Domain owners do not KNOW all the paths their mail follows, even when 
they think that they do. Users frequently find ways to break SPF without 
doing anything wrong.



on the other hans if spf domain owner asked for softfails it would not 
still be 100


but i still suggest to report to dnswl, if not dnswl none listed


Reasonable advice.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Whitelist rules should never pass on SPF fail

2024-05-09 Thread Bill Cole

On 2024-05-08 at 15:53:47 UTC-0400 (Wed, 08 May 2024 16:53:47 -0300)
kurt.va1der.ca via users 
is rumored to have said:

I received a (relatively) well crafted Phishing email today.  It was 
clearly a well planned campaign.  The Spamassassin score was as 
follows:


X-Spam-Status: No, score=-0.4 required=5.0 
tests=GOOG_REDIR_NORDNS=0.001,

HTML_FONT_LOW_CONTRAST=0.001,HTML_MESSAGE=0.001,
NORDNS_LOW_CONTRAST=0.001,RCVD_IN_DNSWL_HI=-5,RDNS_NONE=1.274,
SPF_FAIL=0.919,SPF_HELO_NONE=0.001,URIBL_BLOCKED=0.001,WIKI_IMG=2.397
autolearn=disabled version=3.4.6

DNS white-hole list checks should never ever pass if the SPF checks 
fail.


The only "white-hole" item there is RCVD_IN_DNSWL_HI, which is a 
DNS-based list where IPs which are supposedly "good" can be listed, i.e. 
it is external to SA, not something we manage. You are suggesting that 
the knowledge that an IP does not send spam should be entirely ignored 
if that IP offers a message which fails SPF, which is a solely a domain 
verification and has well-known common failure modes.


I could not disagree more. One purpose in principle for IP-wise 
welcomelisting like DNSWL is to identify known-good transparent 
forwarders who for whatever reason do not implement SRS but also do not 
forward spam.


DNS-based list IP tests are scored in the default distribution without a 
strong  basis, because they do not normally get handled by the RuleQA 
process. It has often been reported here that RCVD_IN_DNSWL_HI is too 
forgiving and that seems true to me. You may wish to reduce its positive 
power. I set it to -2 based on my local observations. YMMV.


You are free to create a local meta-rule which makes SPF_FAIL cancel out 
RCVD_IN_DNSWL_HI. You are free to make the SPF_FAIL score higher. You 
are free to use the priority and shortcircuiting features to assure that 
SPF_FAIL causes DNSWL checks to not be run. I would not expect any of 
these to have an overall positive effect on your email.


In fact, I can't think of any whitelist test that should pass if SPF 
fails.


If you operate on the theory that a SPF failure is always a sign of 
spam, you can make your SpamAssassin always trust SPF failures 
absolutely. I would not recommend that. Some people screw up their SPF 
records. Other people forward mail transparently, which reliably breaks 
SPF. SPF is broken *by design* as a spam control tool AND as a mail 
authentication tool. We knew this 20 years ago, but it remains a useful 
tool if you work with its limits rather than assuming that they do not 
exist.


I could attach a higher score to SPF_FAIL, but that would unduly 
affect cases where the sender wasn't white listed.


I fail to see how that's a problem, in a world where SPF failure 
overrides an IP-based welcome list. However, I do not understand that 
world in general, so I'm sure there's something I'm missing...


I need a way to force Spammassassin to negate the effect of one test 
on the passing of another.


A simple logical problem:

 score RULE_A 3
 score RULE_B -2

 meta  CANCEL_B_IF_A  RULE_A && RULE_B
 score CANCEL_B_IF_A  2

You can also use 'priority' directives to make rules execute in a 
defined order  and a 'shortcircuit' directive to make SA stop processing 
later rules if a specific rule hits. This will also skip any other 
'late' checks, so you have to set priorities with care to avoid 
shortcircuiting rules that you want checked. Consult the docs for 
details.




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Tips for improving bounce message deliverability?

2024-04-24 Thread Bill Cole
On 2024-04-24 at 12:27:01 UTC-0400 (Wed, 24 Apr 2024 18:27:01 +0200)
Benny Pedersen 
is rumored to have said:

>> For example, it matches on
>> *  3.1 URI_IMG_CWINDOWSNET Non-MSFT image hosted by Microsoft Azure
>> infra, possible phishing
>
> this is not in spamassassin core rules

Yes, it is:

updates_spamassassin_org # grep -n '[^A-Z]* URI_IMG_CWINDOWSNET' *
72_active.cf:5635:##{ URI_IMG_CWINDOWSNET
72_active.cf:5637:meta   URI_IMG_CWINDOWSNET 
__URI_IMG_CWINDOWSNET && !__RCD_RDNS_SMTP && !__REPTO_QUOTE && !__URI_DOTEDU
72_active.cf:5638:#score  URI_IMG_CWINDOWSNET 3.500 # limit
72_active.cf:5639:describe   URI_IMG_CWINDOWSNET Non-MSFT image 
hosted by Microsoft Azure infra, possible phishing
72_active.cf:5640:tflags URI_IMG_CWINDOWSNET publish
72_active.cf:5641:##} URI_IMG_CWINDOWSNET
72_scores.cf:408:score URI_IMG_CWINDOWSNET   3.136 
3.060 3.136 3.060

It is being drawn in from John Hardin's sandbox, where he committed the rule on 
2024-01-21 in r1915356

>>  *  2.6 HOSTED_IMG_DIRECT_MX Image hosted at large ecomm, CDN or
>> hosting
>>  *  site, message direct-to-mx
>
> also not in default rule sets

Also NOT TRUE. That one is in the same sandbox source and was last tweaked in 
r1915433 on 2024-01-28

>> It also matches on ANY_BOUNCE_MESSAGE and BOUNCE_MESSAGE. Should metas
>> be created to avoid adding the above scores?
>>
>> What more can be done to improve deliverability of these messages?
>> Perhaps this is something postfix can identify and bypass scanning?
>
> it matches bounces since its a bounce, alt that is seen as a results of 
> forwarding emails

More helpfully, it is possible to exempt bounces from filtering by 
SpamAssassin, a trick that is accomplished by whatever mechanism you use to 
'glue' SA and your MTA (postfix, I assume...) not by SA itself. In the case of 
postfix, there are about a half-dozen mechanisms one can use so I can't say for 
sure. However, in general, if you are using a milter interface you must do the 
discrimination in the milter, while other glue mechanisms can provide selective 
filtering in postfix (at the cost of doing it within postfix.)

A message which matches BOUNCE_MESSAGE (and hence also ANY_BOUNCE_MESSAGE) is 
fairly unlikely to be spam, but we have pegged the scores for all the 
*BOUNCE_MESSAGE rules at 0.1 just to make sure that they are always published 
and visible as control points that can be used by sites that have a particular 
need to accept (or shun) some or all bounces.

-- 
Bill Cole


Re: Defining what the default welcomelist means

2024-04-14 Thread Bill Cole


I believe we are in solid agreement, a few notes below explaining how...


On 2024-04-14 at 08:00:19 UTC-0400 (Sun, 14 Apr 2024 08:00:19 -0400)
Greg Troxel 
is rumored to have said:

> Bill Cole  writes:
>
>> On 2024-04-12 at 18:56:15 UTC-0400 (Fri, 12 Apr 2024 18:56:15 -0400)
>> Greg Troxel 
>>
>>> Bill Cole  writes:
>>>
>>>> 1. We serve our users: receivers, not senders. Senders claiming FPs
>>>> need the support of a corroborating would-be receiver.
>>>
>>> Agreed.  Or maybe we take requests to add only from receivers.
>>
>> Effectively, yes. Senders won't refrain from requesting to be welcomed
>> by default just because we say we don't accept those requests. Only
>> receivers can corroborate the existence of any FP problem which would
>> be solved by a default welcomelist entry, and this isn't a 'just find
>> one example' sort of issue.
>
> They won't refrain from writing, but it's fair to not let them open bugs
> or have bugs open in the tracker.  And to tell them
>
>   1) clean up your mail
>
>   2) we only take requests for defwl from actual receivers, so we're
>   done with this conversation.  use of sock puppets is not ok.
>
> That's what I meant by "not take requests from".

Right. Anyone can open a bug, but we enthusiatically close that are invalid.


[...]
>> I don't see this as misaligned, but rather a way of saying that def_w*
>> entries come behind site-local receiver mitigations and
>> receiver/sender collaboration on fixing the shabby mail.
>
> What I was trying to express is that often senders, even zero-spam
> senders, are often enormous, opaque, and intractable.  So while I agree
> in theory, I guess the real question is whather we want to say to a
> receiver:
>
>   your non-spam mail is spammy, and we aren't going to add a defwl
>   because first you need to get e.g. Bank of America to stop sending
>   html mail.
>
> or
>
>   your non-spam mail is spammy and it's ok to add a defwl
>
> I have occasionally complained to BigCorp and it has never been useful.
> Sure, one can get the branch manager to reverse a fee, but I mean one
> cannot get them to change their practices.

Right. That's why there need to be alternatives to making the mail look less 
spammish. No one is required to persuade bank execs to behave differently...

[...]
> But I don't mean generally/vaguely.  I mean senders that are zero-spam
> and likely important to receivers, in the bank/airline notification (and
> similar) class.  Meaning something with real-world consequences that is
> timely.  Not newsletters.

Right.


> FWIW, I have given up on the KAM rules.  The scores are insanely high
> for things that appear in ham, and I was having too-frequent
> misclassification.  Some of the scores were triggering on things which
> are not even objectively spammy, e.g a watch rule on a technical
> discussion of clocks where it was on topic and I was subscribed.

That's a rabbithole of a different nature.
My point in mentioning the KAM channel was as an example of a local choice 
outside of the default deployment which has a radical effect on FPs. Akin to 
lowering the threshold to 3.0

(FWIW: I think the KAM rules are fine if, like PCCC, you have a staff of 
antispam experts and a mature package of customer-facing and staff-facing tools 
and processes to minimize and mitigate FPs. I use them personally, but I have a 
robust warren of ways for mail to get around SA analysis... )

[...]

>>> I am extremely skeptical of anything that smells of email marketing
>>> here.  I would expect only places sending transactional mail and alerts
>>> to established customers.
>>
>> I share the skepticism, but I have been working with business
>> customers and their love of other businesspeople's email marketing
>> (and random non-work-related email...) for long enough that I have
>> stopped arguing with the nature of email that people eagerly desire in
>> their mailboxes. I care that it is contextually safe, legal, and
>> solidly consensual. There are marketers who stay inside the lines.
>
> If it's really 100% ok, fine.  I just said that I'm skeptical and thus
> require more convincing from and ESP than from bank alerts, to overcome
> a presumption of "email marketing is rarely ok".

Yes, I don't foresee ever seriously considering the addition of any 
marketing-oriented ESP per se to the default welcomelist. They all sometimes 
send spam.

The Microsoft case is an example. The entry I removed matched any subdomain of 
microsoft.com, triggered by spam from an address at email.microsoft.com which 
came to me from a Marketo IP address. Marketo sends a LOT of spam. Marketo 
generally has no listing of its own.

-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Defining what the default welcomelist means

2024-04-13 Thread Bill Cole
On 2024-04-12 at 19:26:59 UTC-0400 (Fri, 12 Apr 2024 16:26:59 -0700)
jdow 
is rumored to have said:

> On 20240412 16:14:44, Greg Troxel wrote:
>> jdow  writes:
>>
>>> One pesky detail still exists. There is a very broad fuzzy area where
>>> my spam is your ham and vice versa. You could probably drive yourself
>>> to an early grave trying to get the perfect Bayes training plus
>>> perfect rule set.
>> spam is bulk and unsolicited.   So yes the same message could be either,
>> but if a sender spams anyone, they are spammer, even if they send mail
>> that isn't spam.
>
> Ah, no, that way leads to disaster.

Not really. Maybe in theory, if you have a slightly wrong theory.

> Some people resign from lists by declaring the sender spam.

They are wrong, objectively, if they subscribed to the list without being 
deceived about what the list entailed.

There is definitely a tension between the recognition that in the ultimate 
analysis, 'spam' is in the eye of the beholder and the fact that there exist 
clear lines between the grey areas in which one can rationally debate whether a 
particular message is spam and the undebatable areas where the ham/spam 
discrimination is clear.

> That could end up cutting access to all the people who want the emails.

That's far outside the realm of possible effects of defining what the default 
welcomelist means and managing it transparently in line with that definition.

> (At various times this list and most 'ix lists were unusually difficult to 
> resign from. And, yes, I have been around that long. I'm just too politically 
> incorrect for most lists these days, {^_-}) It is wise to be careful about 
> how soon you pull the "spammer" trigger. YMMV and YAMV (Attitude).

FWIW, we can't maintain SA to accommodate the obstinacy of gated BITNET 
LISTSERV nodes in '89. The only reasons for unsub difficulties in 2024 are 
technical failures and spammer excuses. Modern SpamAssassin is only supposed to 
deal with modern realities, not historical curiosities.



-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Defining what the default welcomelist means

2024-04-13 Thread Bill Cole
On 2024-04-12 at 19:01:21 UTC-0400 (Fri, 12 Apr 2024 19:01:21 -0400)
Greg Troxel 
is rumored to have said:

> Also, I'm not sure you said this, but I would say:
>
>default whitelist is dkim only

No. Existing practice is that we trust both DKIM and SPF, and I think that's 
fine.

There are no unauthenticated listings extant in the default rules and no new 
ones should ever be created.

>This means
>
>  All existing entries are converted to dkim as well as we can, not
>  worrying if they break.  We'll prune ones that don't work as dkim,
>  and add a signing domain as we figure it out, as a lightweight
>  thing.  But all non-dkim entries go away.
>
>  to consider a new entry, it must be dkim
>
> or maybe that's already true


s/dkim/authenticated/ and it's already true.

This is part of how the default welcomelist has lost alignment with its 
origins. The original was a tactical mitigation against heavy phishing in a 
largely unauthenticated-sender world, deployed in part to forestall extreme 
responses to the problem of everyone claiming to send Paypal notifications to 
everyone.


-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Defining what the default welcomelist means

2024-04-13 Thread Bill Cole
On 2024-04-12 at 18:56:15 UTC-0400 (Fri, 12 Apr 2024 18:56:15 -0400)
Greg Troxel 
is rumored to have said:

> I see it very slightly differently, but mostly agree
>
> Bill Cole  writes:
>
>> 1. We serve our users: receivers, not senders. Senders claiming FPs
>> need the support of a corroborating would-be receiver.
>
> Agreed.  Or maybe we take requests to add only from receivers.

Effectively, yes. Senders won't refrain from requesting to be welcomed by 
default just because we say we don't accept those requests. Only receivers can 
corroborate the existence of any FP problem which would be solved by a default 
welcomelist entry, and this isn't a 'just find one example' sort of issue.

>> 2. If senders have FPs on objectively legitimate mail, their first and
>> most important step is to identify WHY SpamAssassin thinks it is
>> spam. and address that. Do you need the invisible text? Is the message
>> embedded in a remotely-fetched image? The sea of "&zwnj" entities in
>> your messages' HTML serves what purpose exactly? If there's a real FP
>> problem with some rule that regularly is proved out by RuleQA, open a
>> bug.
>
> Sure, but if you serve receivers, often people will have misfiling and
> the sender is opaque, even if not spam and dkim.  So saying the sender
> should fix is misaligned with serving receivers.  Yes, they *should*,
> but people shouldn't send html mail either :-)

I don't see this as misaligned, but rather a way of saying that def_w* entries 
come behind site-local receiver mitigations and receiver/sender collaboration 
on fixing the shabby mail.

> I agree that requests from senders should be met with "make your mail
> less spammy".

Right. If SA is generating FPs, in nearly all cases this can be fixed without 
resorting to a global welcomelist entry. There's a balance between local rule 
mitigations, sender adjustments to lose spamsign patterns, and tweaks to the 
rules at the project level which validate in RuleQA in how FP issues are 
solved, and def_wl entries really should be a last resort.

One reason I opened this topic is that many existing listings were nothing like 
last resorts to solve concrete problems but seem to be more prophylactically 
applied. I.e. to assure that generally (and vaguely) 'good' senders will get 
their mail through despite using pointless antipatterns that are predominantly 
used by spammers. Maybe there's a need for that, but it should not be part of 
SA proper.


>> 3. This is NOT a general-purpose reputation list. It exists to aid SA
>> users who have FPs from SpamAssassin default rules for wanted mail,
>> where we cannot determine any acceptable adjustment to rules which
>> would avoid the problem. It is a "last resort" form of FP mitigation
>> when we cannot find an acceptable general solution that isn't
>> domain-specific to a widely accepted sender domain.
>
> I see all spam classification as probabalistic and there is risk of FP.
> If a domain emits *only ham* and is dkim signed, and we believe that
> receivers want it, I think it makes sense to have it in.

I see no point in that if there is no *evidence* of actual FPs. I don't think 
the default rules should try to game local incidents of Bayes or AWL 
dis-learning that ends up hitting banking notifications. Or (at the risk of 
being misinterpreted...) by the use of 3rd-party rules like the KAM channel 
that are much tougher on the bad HTML practices of corporate email composers.

> I think of things like alerts from banks, airline saying your flight
> time has changed, etc. where FPs are a real problem.

Right. I think we basically have that covered with the legacy entries, which 
are extensive, undocumented, and generally banal.

> I am extremely skeptical of anything that smells of email marketing
> here.  I would expect only places sending transactional mail and alerts
> to established customers.

I share the skepticism, but I have been working with business customers and 
their love of other businesspeople's email marketing (and random 
non-work-related email...) for long enough that I have stopped arguing with the 
nature of email that people eagerly desire in their mailboxes. I care that it 
is contextually safe, legal, and solidly consensual. There are marketers who 
stay inside the lines.

>> 4. We should only add or remove listings based on specific requests
>> backed by transparent evidence. Subversion commit messages are not
>> enough, we need a bug report or a mailing list discussion.
>
> sure

Important because it brings us more in line with the transparency norms that 
all ASF projects are expected to follow and because it reduces the likelihood 
of snowballing conflict to have a recor

Re: Dynamic blacklist ?

2024-04-12 Thread Bill Cole

On 2024-04-12 at 02:14:59 UTC-0400 (Fri, 12 Apr 2024 08:14:59 +0200)
Pierluigi Frullani 
is rumored to have said:


Hello all,
  do you know if there is a way to have a blacklist, either for user 
or
eventually for an entire server, that could be feeded via some scripts 
?


If you enable the AWL (or TxRep, if you are adventurous) Plugin, it 
provides an automated welcome/blocklist mechanism where the past base 
score of messages are used to adjust the score of later messages from 
the same sender and network block tuple. Its power can be adjusted and 
its usage is described in the documentation.


The same database is used for the blocklist and welcomelist options of 
the spamassassin command-line script, which is documented in the 
'spamassassin-run' man page. There is also a useful script named sa-awl 
with a fine man page.



A sort of auto_learn but only for addresses ( to or from ) ?


Correct: auto_learn and the sa-learn commandline script feed whole 
messages to a complex naive Bayesian analysis that feeds the Bayes DB. 
The 'auto_learn' config, the *list options for spamassassin, and sa-awl 
all operate on the AWL DB using a very simple algorithm.


Unlike the Bayes subsystem, the AWL subsystem has no minimum data 
threshold. If you feed one message to 'spamassassin -W' then the next 
message from the same sender+network combination will have its score 
adjusted according to your auto_welcomelist_factor setting, as 
documented in 'perldoc Mail::SpamAssassin::Plugin::AWL' along with all 
the other details of AWL.



I'll trying to explain: I maintain a couple of mail servers that have 
a
very very limited e-mails volumes, at least in output, so the bayes 
it's

almost not usefull as it takes ages to be feeded for the HAM part.
At the moment I'm taking addresses from the spam directory and feeding 
to
local.cf but it's a slow ( and painfull ) process so if there is a 
better

way it would be fantastic.


I guess this is the short version of an answer...

If you have AWL enabled and configured so that everyone uses the same 
AWL DB, you could do this if you have a directory full of fresh spam 
whose senders you want to shun:


   cd $spamdirectory
   spamassassin --add-to-blocklist *

And if you have a bunch of mail you value in a directory, use "-W" 
instead.





--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Defining what the default welcomelist means

2024-04-12 Thread Bill Cole
The de-welcomelisting of MS marketing raises the question: Why do we maintain a 
"default" welcomelist?

Based on the documentation, the original purpose of the def_welcomelist* (then 
whitelist) feature set was to give a set of senders of purely legitimate mail 
from FPs, with a listing having reduced power relative to normal welcomelist 
entires *because they were widely phished by spammers*. This was before sender 
authentication (SPF & DKIM) were in broad use and before 
authentication-empowered welcomelist features existed. Having these senders 
weakly protected in SA was a preventative measure to keep frustrated admins (or 
poisoned Bayes DBs) from rashly overprotecting them and all the phishes or 
blocklisting them. Today there's much less risk in the welcomelist features 
because we recommend and use the authenticated forms, which means that if you 
like, you can use SA WL/BL rules to demand only authenticated mail from some 
senders. In that context, the purpose of the default welcomelist has wandered.

This explains to some degree the lack of clear relevance and meaning of 
listings. It is a remnant feature that has outlived its original justification. 
The original list included big names of respected companies that "everyone" 
(i.e. as warped by SA committer and vocal user non-diversity...) got occasional 
important mail from and would never want to block mail from. It has drifted 
into being a list of "good guys" who, based on our committers' experiences, get 
FPs that they do not deserve. We have drifted perilously close to being a 
maintainer of a low-visibility free reputation service with lax oversight. We 
also come near that peril in our explicit lists of TLDs which are objectively 
dominated by spammers, but in that case I think we have that risk contained 
because we have a methodology for validating TLD inclusion and removal: testing 
single-TLD rules in QA. The default welcomelist is unconfined, because we don't 
have a clear explicit standard or even a formal transparent mechanism for 
inclusion and removal. My understanding of some listings is that they were 
based on one mid-sized site's FPs from their wanted mailstream dominated by 
one-to-few "personal" B2B email. That is very hard to validate, and at this 
point the list is too big to trim it back to just the original concept, 
especially since that concept no longer has much real-world use. It needs more 
structure to keep it from becoming just "friends, employers, and extorters of 
SA committers" or being perceived as such.

I believe that part of a way to avoid that is an absolute zero-tolerance policy 
for spam from listees. We cannot support any standard that gets us bogged down 
into debates with senders over whether their spam is enough spam to justify 
risking the FPs they would get without a listing, because we cannot measure 
that. We cannot be subservient to sender business models that require them to 
take shortcuts in assuring that they do not ever send spam. We must not be 
telling our users that they should just eat their spam because some sender 
doesn't want to spend on confirming users and seems to have a working unsub 
link.

I ultimately want to document how we add and remove listings and what users 
should expect from the default welcomelist. I think some important elements are:

1. We serve our users: receivers, not senders. Senders claiming FPs need the 
support of a corroborating would-be receiver.

2. If senders have FPs on objectively legitimate mail, their first and most 
important step is to identify WHY SpamAssassin thinks it is spam. and address 
that. Do you need the invisible text? Is the message embedded in a 
remotely-fetched image? The sea of "&zwnj" entities in your messages' HTML 
serves what purpose exactly? If there's a real FP problem with some rule that 
regularly is proved out by RuleQA, open a bug.

3. This is NOT a general-purpose reputation list. It exists to aid SA users who 
have FPs from SpamAssassin default rules for wanted mail, where we cannot 
determine any acceptable adjustment to rules which would avoid the problem. It 
is a "last resort" form of FP mitigation when we cannot find an acceptable 
general solution that isn't domain-specific to a widely accepted sender domain.

4. We should only add or remove listings based on specific requests backed by 
transparent evidence. Subversion commit messages are not enough, we need a bug 
report or a mailing list discussion.

5. Existing entries are presumed valid unless and until they cause a false 
"ham" classification of spam which can be shared publicly in a useful form.

6. New entries must pass prolonged RuleQA testing of sender-specific rules 
before being added to the default welcomelist.

As with everything SpamAssassin: input from users and other contributors is 
eagerly desired...,

-- 
Bill Cole

WARNING: Microsoft has earned removal from SA default welcomelist

2024-04-12 Thread Bill Cole
Yesterday I received marketing spam from "Microsoft 
" advertising something apparently called 
"Microsoft Build" which is either a website or a marketing event: IDGAF. Spam 
was sent via Marketo, which I gather is now part of the sewer we call Adobe. It 
was absolutely authentic. Fully authentic Microsoft spam passing SPF, DKIM, and 
DMARC.

That spam was sent to my oldest and most widely scraped address 
(b...@scconsult.com) which I've literally never given to anyone for subscribing 
to or purchasing anything and which I am 100% certain I've never given to 
Microsoft in any way intentionally. There is no indication in the spam of any 
associated MS account. My comprehensive 29yr archive of all email ever received 
by that address has NO prior mail from MS. There was an unsub link, which got a 
page which revealed that I was somehow subscribed to multiple marketing 
bullshit lists. That page offered me a link to my "profile"(!?) which seemed to 
start to want to load up a page with an image and text placeholder blobs 
pulsing a bit before switching to a generic Microsoft account signup/login 
page. MS knew what my email address was and had me subscribed to multiple lists 
in some sort of "profile" without even asking me and without associating it to 
any actual MS account that I could conceivably access. I do have multiple MS 
accounts that I need for work purposes, and one I use for testing, but none of 
those are associated with b...@scconsult.com (except as a correspondent.)

In my opinion, this is an indication that the default welcomelist entries in 
the official SpamAssassin rules for '*@*.microsoft.com' are inappropriate. Note 
that there is an entry for '*@accountprotection.microsoft.com' which is still 
justified as far as I know. This is entirely unrelated to any domains hosted by 
Microsoft, it is strictly an email address welcomelisting (see SA docs for 
details.)

I will be committing the rule change today and it should appear in the default 
rules distribution channel by Monday. Anyone who is relying on that SA 
welcomelisting to accept wanted mail from MS should do so locally based on the 
specific local needs. I will also document this in a bug report, which I will 
resolve, to have a record of when and why this was done.

This may raise some questions and trigger a debate on the formal meaning of the 
SA default welcomelist entries. That debate belongs on the SpamAssassin Users 
List, but may pop up elsewhere. I believe that we have left a gap there in 
having a quite vague definition of what default welcomelist entries represent. 
As far as I know, clear criteria for inclusion have never been promulgated and 
accepted by the PMC or the user community.

More to follow in a separate thread.


-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: problems with Plugin::ASN and spam

2024-04-11 Thread Bill Cole
On 2024-04-10 at 21:19:48 UTC-0400 (Wed, 10 Apr 2024 20:19:48 -0500)
Darrell Budic 
is rumored to have said:

>> On Apr 10, 2024, at 2:52 PM, Benny Pedersen  wrote:
>>
>> Darrell Budic skrev den 2024-04-10 19:48:
>>
>>> Anything I’m missing?
>>
>> using amavisd ?
>>
>> then try this in amavisd.conf:
>>
>>
>> @spam_scanners = (
>># ['SpamAssassin', 'Amavis::SpamControl::SpamAssassin'],
>>['SpamdClient', 'Amavis::SpamControl::SpamdClient']
>> );
>>
>> 1;  # insure a defined return value
>>
>> if this works, its amavisd missing to add that header spamassassin add in 
>> add-header
>>
>> dont enable both spam_scanners, just one of them, and with the last start 
>> spamd, as you have you already have this
>>
>> would be nice if its just that
>>
>
> No, I”m using spamass-milter to send it over from postfix. Here’s my 
> spamass-milter config in case I missed something there (systemd running it on 
> alma 8 in this case):
>
> EXTRA_FLAGS="-e onholyground.com -u defang -m -r 15 -i 127.0.0.1 -g sa-milt 
> -- --max-size=512 --dest=sa0.int.ohgnetworks.com,sa1.int.ohgnetworks.com 
> --randomize"

That's intriguing because "-u defang" looks like cargo-cult spoor from an 
installation running MIMEDefang. Does the user 'defang' have appropriate 
configs?

> Both sa0 & sa1 run the same spamassassin/spamd configurations, neither of 
> them add the X-Spam-ASN headers. All other add_header entries work fine.

Validate that configs on both machines match. In this sort of setup, only the 
SA config on the spamd hosts of the user spamd is run as makes any difference.

-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: problems with Plugin::ASN and spam

2024-04-10 Thread Bill Cole
On 2024-04-10 at 13:48:47 UTC-0400 (Wed, 10 Apr 2024 12:48:47 -0500)
Darrell Budic 
is rumored to have said:

> Just checking in here that I’m not doing something wrong with the ASN plugin 
> before I file a bug on this. SpamAssassin 4.0.1 installed from cpan on Alma 9.
>
> I’ve got it configured to use the local maxmind db files, and those show up 
> in logs. Testing in spamassassin itself show that it finds the ASN and 
> includes it in the headers as expected. But when I let spamc/spamd process 
> emails, the X-Spam-ASN headers do not appear. Enabling debug logging on spamd 
> shows it does find the ASN properly, but doesn’t include the header. All my 
> other add_header entries show up as expected.

This smells like a case of not using the config that you think you are.

> Relevant config:

Says you... :)

When you run the spamassassin script from the command line, it loads your user 
prefs from ~/.spamassassin/user_prefs and uses them. When you use spamc to talk 
to spamd, which prefs are loaded depends on your configuration of spamd, 
perhaps using only the global config, possibly using the config of the user 
running spamd, and possibly (with configuration of spamd that allows it to use 
per-user configs properly) that of arbitrary users per message.

Differences in how spamc/spamd and spamassassin on the command line behave are 
almost always due to this.
> report_safe 0
> ifplugin Mail::SpamAssassin::Plugin::ASN
>  asn_prefix ''
>  asn_lookup asn.routeviews.org _ASN_ _ASNCIDR_
>  add_header all ASN _ASN_ _ASNCIDR_
>
>  # IPv6 support (Bug 7211)
>  asn_lookup_ipv6 origin6.asn.cymru.com _ASN_ _ASNCIDR_
> endif   # Mail::SpamAssassin::Plugin::ASN
>
> From the spamd debug log:
>
> Wed Apr 10 17:06:50 2024 [2246409] dbg: geodb: GeoIP2: search found asn 
> /usr/share/GeoIP/GeoLite2-ASN.mmdb
> Wed Apr 10 17:06:50 2024 [2246409] dbg: geodb: GeoIP2: loaded asn from 
> /usr/share/GeoIP/GeoLite2-ASN.mmdb
> Wed Apr 10 17:07:09 2024 [2246418] dbg: asn: using GeoDB ASN for lookups
> Wed Apr 10 17:07:09 2024 [2246418] dbg: asn: using first external relay IP 
> for lookups: 149.72.37.58
> Wed Apr 10 17:07:09 2024 [2246418] dbg: asn: GeoDB found ASN 11377
>
> There are no dgb: markup: entries for the ASN header.
>
> Anything I’m missing?

Look at the debug channel for config and etermine which config files are 
actually being used by spamd and by spamassassin. (spamc knows nothing of SA 
configs...)


-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: OT: Trigger words in email addresses?

2024-04-09 Thread Bill Cole

On 2024-04-07 at 21:40:40 UTC-0400 (Sun, 7 Apr 2024 20:40:40 -0500)
Jerry Malcolm 
is rumored to have said:

 But I have a co-worker that is convinced that "donotre...@xyz.com" is 
a trigger for gmail's spam filters and all spam filters will score the 
email higher as spam due simply to that word in the email address. 


1. "All spam filters" isn't a useful phrase. Nothing is true of all spam 
filters.


2. Google's filters are, beyond their documented rules, entirely opaque. 
Anyone who claims to know anything about how they work internally is not 
to be trusted.  I seem to recall someone who maintains GMail filtering 
(Brandon Long) saying as much in the MailOps list.


3. I just sent myself a message from donotre...@billmail.scconsult.com 
(a never-before-seen bogus address) via my personal mail server to one 
of my GMail accounts and it delivered into the Inbox. So your cow-orker 
is simply wrong.


Obviously, you need to follow all of Google's well-publicized 
recommendations for volume senders if you want to stand any chance of 
getting messages into INBOX instead of Spam. Other tricks that *SEEM TO 
ME* to help is to send simple text messages instead of complex 
multipart/alternative messages with HTML or (WORSE) pure HMTL. Modern 
MUAs recognize URLs in plaintext and for basic confirmations like this, 
you should keep the message as simple, clear, and unadorned as possible.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Multiple test failures

2024-04-03 Thread Bill Cole

On 2024-04-03 at 14:01:44 UTC-0400 (Wed, 3 Apr 2024 14:01:44 -0400)
Scott Ellentuch 
is rumored to have said:


Hi,

Ok, deleted the directory and started again.

Test Summary Report
---
t/spamd_client.t(Wstat: 26624 Tests: 4 Failed: 0)
  Non-zero exit status: 104
  Parse errors: Bad plan.  You planned 52 tests but ran 4.
Files=217, Tests=3765, 890 wallclock secs ( 1.21 usr  0.19 sys + 
271.62

cusr 25.51 csys = 298.53 CPU)
Result: FAIL
Failed 1/217 test programs. 0/3765 subtests failed.
make: *** [test_dynamic] Error 255

Script file attached.


This error appears to be a problem launching a spamd instance from the 
test harness, verifying its PID, and getting responses from it. You can 
get more details logged by clearing the test logs and re-running just 
the one test file which displays the problem:


  rm -r t/log
  make test TEST_FILES="t/spamd_client.t"  TEST_VERBOSE=1

That should provide the precise command used to launch spamd and 
hopefully a clue about why it failed. There may also be useful 
information logged under t/log/ after a failed test.


One possibility is a local packet filter (iptables, nftables, etc.) 
blocking the port spamd uses for testing. That is rare because it 
selects an unused high  port on the loopback interface for the test run, 
but if you have a very tight network security policy in place, that can 
fail. SELinux and AppArmor can also interfere.





Thanks Tuc

On Wed, Apr 3, 2024 at 10:46 AM Bill Cole <
sausers-20150...@billmail.scconsult.com> wrote:


On 2024-04-02 at 18:18:09 UTC-0400 (Tue, 2 Apr 2024 18:18:09 -0400)
Scott Ellentuch 
is rumored to have said:


Hi,

Trying to install SA 4.0.1 from scratch. Tried via CPAN, that didn't
go
well, so trying from tarball. (Enabled SSL when doing Makefile.PL)


NEVER run 'make' as root except when you're ready to commit with 
'make

install' unless you're doing it on a sacrificial system.

Think about how unsafe it could be...

These test failures look like you did that. I am flattered that you
trust the SpamAssassin team that much, but don't, please. We are only
human. In the past there have been bugs in the test suite that have
polluted the running config of the system if run as root. It is 
possible

in principle for there to still be such bugs.


I'm on Amazon Linux 2 , 4.0.1 SA, and not sure what other info I can
give.
I installed every perl module it wanted.


FWIW, non-root 'make test' has been clean for PMC members on a wide
range of systems, so a real test failure would be both a shock and a
serious problem. If this is happening with a normal user running 
'make

test' we definitely need to
address it.



The final summary is -
Test Summary Report
---
t/spamc_optL.t  (Wstat: 2560 Tests: 18 Failed: 10)
  Failed tests:  2, 5-8, 10, 12, 15-16, 18
  Non-zero exit status: 10
t/spamd_client.t(Wstat: 3584 Tests: 52 Failed: 14)
  Failed tests:  35, 37-42, 44, 46-51
  Non-zero exit status: 14
Files=217, Tests=3807, 904 wallclock secs ( 1.21 usr  0.22 sys +
273.72
cusr 26.33 csys = 301.48 CPU)
Result: FAIL
Failed 2/217 test programs. 24/3807 subtests failed.
make: *** [test_dynamic] Error 255

During the run it seems to output :

t/spamd_client.t .. 32/52
#   Failed test at t/spamd_client.t line 152.
ERROR: Bayes dump returned an error, please re-run with -D for more
information
t/spamd_client.t .. 37/52
#   Failed test at t/spamd_client.t line 157.
Not found: spam in database = 1 0  non-token data: nspam at
t/spamd_client.t line 158.

#   Failed test at t/SATest.pm line 926.

#   Failed test at t/spamd_client.t line 161.
ERROR: Bayes dump returned an error, please re-run with -D for more
information

#   Failed test at t/spamd_client.t line 165.
Not found: ham in database = 0 0  non-token data: nham at
t/spamd_client.t
line 166.

#   Failed test at t/SATest.pm line 926.
Not found: spam in database = 0 0  non-token data: nspam at
t/spamd_client.t line 166.

#   Failed test at t/SATest.pm line 926.
t/spamd_client.t .. 44/52
#   Failed test at t/spamd_client.t line 172.
ERROR: Bayes dump returned an error, please re-run with -D for more
information

#   Failed test at t/spamd_client.t line 177.
Not found: ham in database = 1 0  non-token data: nham at
t/spamd_client.t
line 178.

#   Failed test at t/SATest.pm line 926.

#   Failed test at t/spamd_client.t line 181.
ERROR: Bayes dump returned an error, please re-run with -D for more
information
t/spamd_client.t .. 49/52
#   Failed test at t/spamd_client.t line 185.
Not found: ham in database = 0 0  non-token data: nham at
t/spamd_client.t
line 186.

#   Failed test at t/SATest.pm line 926.
Not found: spam in database = 0 0  non-token data: nspam at
t/spamd_client.t line 186.

#   Failed test at t/SATest.pm line 926.
t/spamd_client.t 

Re: Syslog local3

2024-04-03 Thread Bill Cole

On 2024-04-03 at 05:49:20 UTC-0400 (Wed, 3 Apr 2024 11:49:20 +0200)
Emmanuel Seyman 
is rumored to have said:


Hello, all.

It's taken me nearly a year to realize this but spamassassin sends to
syslog with the local3 facility, not 'mail' as I had assumed.


The spamd daemon logs as mail as configured in the source distribution, 
but a packager (e.g. Debian) may modify that. If you are using something 
else to call Spamassassin, e.g. Amavis, MIMEDefang, etc., that other 
software controls the logging.



Is this something that can be configured?


If you're running spamd, the facility is set with the "-s" option, as 
documented in the man page.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Multiple test failures

2024-04-03 Thread Bill Cole

On 2024-04-02 at 18:18:09 UTC-0400 (Tue, 2 Apr 2024 18:18:09 -0400)
Scott Ellentuch 
is rumored to have said:


Hi,

Trying to install SA 4.0.1 from scratch. Tried via CPAN, that didn't 
go

well, so trying from tarball. (Enabled SSL when doing Makefile.PL)


NEVER run 'make' as root except when you're ready to commit with 'make 
install' unless you're doing it on a sacrificial system.


Think about how unsafe it could be...

These test failures look like you did that. I am flattered that you 
trust the SpamAssassin team that much, but don't, please. We are only 
human. In the past there have been bugs in the test suite that have 
polluted the running config of the system if run as root. It is possible 
in principle for there to still be such bugs.


I'm on Amazon Linux 2 , 4.0.1 SA, and not sure what other info I can 
give.

I installed every perl module it wanted.


FWIW, non-root 'make test' has been clean for PMC members on a wide 
range of systems, so a real test failure would be both a shock and a 
serious problem. If this is happening with a normal user running 'make 
test' we definitely need to

address it.



The final summary is -
Test Summary Report
---
t/spamc_optL.t  (Wstat: 2560 Tests: 18 Failed: 10)
  Failed tests:  2, 5-8, 10, 12, 15-16, 18
  Non-zero exit status: 10
t/spamd_client.t(Wstat: 3584 Tests: 52 Failed: 14)
  Failed tests:  35, 37-42, 44, 46-51
  Non-zero exit status: 14
Files=217, Tests=3807, 904 wallclock secs ( 1.21 usr  0.22 sys + 
273.72

cusr 26.33 csys = 301.48 CPU)
Result: FAIL
Failed 2/217 test programs. 24/3807 subtests failed.
make: *** [test_dynamic] Error 255

During the run it seems to output :

t/spamd_client.t .. 32/52
#   Failed test at t/spamd_client.t line 152.
ERROR: Bayes dump returned an error, please re-run with -D for more
information
t/spamd_client.t .. 37/52
#   Failed test at t/spamd_client.t line 157.
Not found: spam in database = 1 0  non-token data: nspam at
t/spamd_client.t line 158.

#   Failed test at t/SATest.pm line 926.

#   Failed test at t/spamd_client.t line 161.
ERROR: Bayes dump returned an error, please re-run with -D for more
information

#   Failed test at t/spamd_client.t line 165.
Not found: ham in database = 0 0  non-token data: nham at 
t/spamd_client.t

line 166.

#   Failed test at t/SATest.pm line 926.
Not found: spam in database = 0 0  non-token data: nspam at
t/spamd_client.t line 166.

#   Failed test at t/SATest.pm line 926.
t/spamd_client.t .. 44/52
#   Failed test at t/spamd_client.t line 172.
ERROR: Bayes dump returned an error, please re-run with -D for more
information

#   Failed test at t/spamd_client.t line 177.
Not found: ham in database = 1 0  non-token data: nham at 
t/spamd_client.t

line 178.

#   Failed test at t/SATest.pm line 926.

#   Failed test at t/spamd_client.t line 181.
ERROR: Bayes dump returned an error, please re-run with -D for more
information
t/spamd_client.t .. 49/52
#   Failed test at t/spamd_client.t line 185.
Not found: ham in database = 0 0  non-token data: nham at 
t/spamd_client.t

line 186.

#   Failed test at t/SATest.pm line 926.
Not found: spam in database = 0 0  non-token data: nspam at
t/spamd_client.t line 186.

#   Failed test at t/SATest.pm line 926.
t/spamd_client.t .. 52/52 # Looks like you failed 14 
tests

of 52.
t/spamd_client.t .. Dubious, test returned 14 (wstat 
3584,

0xe00)
Failed 14/52 subtests

Any indications as to the issue?

Thanks, Tuc



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Doesn't spamc/spamd need block/welcomeliist support???

2024-03-21 Thread Bill Cole
On 2024-03-21 at 13:21:54 UTC-0400 (Thu, 21 Mar 2024 18:21:54 +0100)
 
is rumored to have said:

> On 3/20/24 21:58, Bill Cole wrote:
>> I'm not sure how I've not noticed before, but unless I'm missing something, 
>> there is no way to replicate the [block,welcome]list functionalities of the 
>> spamassassin script when using the spamc/spamd interface.
>>
>> Does anyone see it hiding somewhere that I don't?
>>
>> Does anyone have any rationale for this missing functionality?
>>
>> I don't expect that it would be difficult to add. (Something I've believed 
>> every time I've taken on a coding task...)
>>
> are you referring to spamassassin -W/-R options that are not present on 
> spamc(1) ?

Yes, (plus all of the related --*-list commands.)

It seems to me that it would require extension of the spamc/spamd protocol and 
cargo-culting some code from spamassassin to spamd. Looking to provide a more 
elegant solution to 'strong' feedback than giving scanning-client machines 
direct access to the reputation DB. For example, the script I have that handles 
user-identified escaped spam includes this ugly snippet:

  spamassassin --remove-from-welcomelist < $spam
  spamc -L spam < $spam
  spamassassin --add-to-blocklist < $spam

This only works because the local spamassassin and remote spamd share access to 
the same reputation DB. This is not optimal.



-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


signature.asc
Description: OpenPGP digital signature


Re: Doesn't spamc/spamd need block/welcomeliist support???

2024-03-21 Thread Bill Cole

On 2024-03-21 at 12:08:48 UTC-0400 (Thu, 21 Mar 2024 17:08:48 +0100)
Matus UHLAR - fantomas 
is rumored to have said:


On 20.03.24 16:58, Bill Cole wrote:
I'm not sure how I've not noticed before, but unless I'm missing 
something, there is no way to replicate the [block,welcome]list 
functionalities of the spamassassin script when using the spamc/spamd 
interface.


Does anyone see it hiding somewhere that I don't?

Does anyone have any rationale for this missing functionality?

I don't expect that it would be difficult to add. (Something I've 
believed every time I've taken on a coding task...)


How/where did you try to define it?


The *lists are used by spamd just fine, but spamd cannot do the 
equivalent of the spamassassin script's -R, -W, and related commands 
because spamc has no way to tell it to do those things.




"spamc -u" should pass username to spamd which then should use that 
users' user_prefs file (if it exists) unless spamd was started with 
"-x" parameter or can't access that file.


Imagine a world where spamc and spamd run on different machines, the 
ones with spamc may or may not have a working SA installation, and the 
spamd is using sitewide {W,B}Lists. Or per-user prefs but in a DB with 
virtual users.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Doesn't spamc/spamd need block/welcomeliist support???

2024-03-21 Thread Bill Cole

On 2024-03-21 at 11:57:43 UTC-0400 (Thu, 21 Mar 2024 11:57:43 -0400)
Kris Deugau 
is rumored to have said:


Bill Cole wrote:
I'm not sure how I've not noticed before, but unless I'm missing 
something, there is no way to replicate the [block,welcome]list 
functionalities of the spamassassin script when using the spamc/spamd 
interface.


Does anyone see it hiding somewhere that I don't?

Does anyone have any rationale for this missing functionality?

I don't expect that it would be difficult to add. (Something I've 
believed every time I've taken on a coding task...)


I'm pretty sure you're doing something wrong (maybe a 
missing/commented loadplugin entry somewhere?


Maybe you've misunderstood my question. The spamc/spamd system uses 
whatever  AWL or TxRep DB is configured in evaluating messages and does 
the automated part of managing those. The spamassassin-run man page 
refers to spamd in the description of -W, -R, and various 
--{add,remove}_*list options, but I see no way that they are relevant to 
spamd.


Would expect that to bite the spamassassin script too tho), because 
this has been supported for a long time.


Great. Please enlighten me:

  How do I feed spamc a message and tell it to add or remove all 
addresses
  in it to the global or user welcomelist or blocklist (using +/- 
$bignum
  in the AWL or TxRep list) or tell it to add or remove individual 
addresses

  from those lists?

There is no mechanism to do that which I have found in our documentation 
or code. I have re-read the PROTOCOL document, which does not seem to 
document any mechanism to manipulate the reputation DB.


The spamassassin script does these things, usually by directly accessing 
file-based DBs in the local ~/.spamassassin. The strongest reason to use 
spamc/spamd is to have spamd on a different machine, so you cannot 
depend on the machine where spamc is running having a sane SA config or 
even a fully working SA installation.


On a single standalone system, IME you'd have to go to extra effort to 
get spamc/spamd to use a different config than the spamassassin script 
- if it works for one, it should work for the other.


On one machine, sure: you can just use the spamassassin script to work 
with the same AWL or TxRep DB that spamd uses. That's not a relevant 
case.


The solution I've used it is to have both the spamc and spamd machines 
using the same configuration, with both having identical SA 
installations and config and being able to access the same reputation 
DB. That is sub-optimal.




I have file-based userprefs on my personal colo server with both[1] in 
use and working just fine for a long time (since early SA 2.x at 
least, IIRC), spamc called from promail on delivery.  Wearing my work 
hat we have all three of: local configuration in /etc, local rules 
channel configuration, and SQL-based userprefs also using 
block|welcome entries of several types, all working just fine[2] with 
spamc called on delivery from a custom local delivery agent.


I've just rechecked with SA 4 trunk, and a temporary spamd instance 
was quite happy to load an existing .cf containing a lng list of 
local welcomelist entries[3], and correctly hit 
USER_IN_DKIM_WELCOMELIST more or less completely fresh out of the box.


The plugins are spread around a bit:

init.pre:  SPF supports welcomelist_from_spf
v3.12.pre: DKIM supports welcomelist_from_dkim
v3.20.pre: WLBLEval supports welcomelist_from[_rcvd]

welcomlist_auth seems to be some internal voodoo layered on top of 
welcomelist_from_dkim and welcomelist_from_spf, at a quick rummage it 
seemed to be quite happy to function and hit with *either* the SPF or 
DKIM plugins enabled, modulo having suitable thing for 
welcomelist_auth to be looking at.


-kgd

[1] Well, legacy whitelist_*/blacklist_*, on account of having dragged 
the config along for so long...


[2] Aside from testing mail well after the fact, where it was sent 
through one or another bulk mail platform that sets a ridiculously 
short DKIM expiry timestamp.  Which isn't SA's fault, although it 
would be nice to have a command flag somewhere to force DKIM 
processing to be "as of Received: timestamp" or "as of between 
timestamps in DKIM header" so as to better confirm if there's even a 
point in adding the welcomelist_* entry.


[3] Also technically legacy whitelist_*, but the rule and config 
aliasing also worked fine.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Doesn't spamc/spamd need block/welcomeliist support???

2024-03-20 Thread Bill Cole
I'm not sure how I've not noticed before, but unless I'm missing 
something, there is no way to replicate the [block,welcome]list 
functionalities of the spamassassin script when using the spamc/spamd 
interface.


Does anyone see it hiding somewhere that I don't?

Does anyone have any rationale for this missing functionality?

I don't expect that it would be difficult to add. (Something I've 
believed every time I've taken on a coding task...)


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: OT: Microsoft Breech

2024-03-19 Thread Bill Cole
On 2024-03-19 at 09:51:04 UTC-0400 (Tue, 19 Mar 2024 08:51:04 -0500)
Thomas Cameron 
is rumored to have said:

> Does anyone else just block all traffic from *.onmicrosoft.com?

Yes. No collateral damage noticed. That includes a system that has 
administrative and alerting role accounts which handle email alerts from Azure 
and MS365.

> I have literally NEVER gotten anything from that domain which is not obvious 
> junk.
>
> I set up postfix to just flat out refuse anything from that domain.[1] If I 
> get any complaints, I may ease it up, but I was getting TONS of spam messages 
> from that domain and I figured it was easiest to just block it.
>
> -- 
> Thomas
>
> [1]
>
> [root@east ~]# grep onmicrosoft /etc/postfix/sender_access
> /@*.onmicrosoft\.com/ REJECT
>
> [root@east ~]# grep sender_access /etc/postfix/main.cf
> check_sender_access regexp:/etc/postfix/sender_access
>
> On 3/18/24 21:13, Jimmy wrote:
>>
>> It's possible that certain email accounts utilizing email services with 
>> easily guessable passwords were compromised, leading to abuse of the 
>> .onmicrosoft.com subdomain for sending spam via email.
>>
>> I've observed an increase in the blocking of IPs belonging to Microsoft 
>> Corporation by the SpamCop blacklist since November 2023, with a notable 
>> spike in activity during February and March 2024.
>>
>> Jimmy
>>
>>
>> On Tue, Mar 19, 2024 at 12:10 AM Jared Hall via users 
>> mailto:users@spamassassin.apache.org>> wrote:
>>
>> I've several customers whose accounts were used to send spam as a
>> result
>> of Microsoft's infrastructure breech.
>>
>> Curiously, NOBODY has received any breach notifications from Microsoft,
>> despite personal information being compromised.
>>
>> What has anyone else experienced?
>>
>> Thanks,
>>
>> -- Jared Hall
>>


-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Callout verification with SpamAssassin ?

2024-02-19 Thread Bill Cole

On 2024-02-18 at 18:40:45 UTC-0500 (Mon, 19 Feb 2024 00:40:45 +0100)
Matija Nalis 
is rumored to have said:


Preface:

- Firsty: yes, I'm fully aware of all issues associated with
  https://en.wikipedia.org/wiki/Callout_verification
  (and there is a LOT of them!)


Which is why SA does not support such so-called verification in any way. 
It never will as long as I'm a contributor.



- I'm not looking for debate about general usefulness of Callout
  verification (and the system for which it is being investigated is
  not general-purpose e-mail system).


This is a bit like saying you don't want to debate the general 
usefulness of spamming. And then going on to ask about ways to spam. I 
do not care about why you are trying to use SMTP callback verification, 
because it is a fundamentally broken concept. If it looks like a 
solution to you, you are refusing to look at solving your real problem.


All set then. SA is not the right tool for you. Try something like Exim, 
MailMunge, or MIMEDefang that let you write arbitrary code for the 
mail-handling flow. I suppose you may be able do it in sendmail.cf too, 
if you're into self-torture.





--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Plugin fo content modification

2024-02-19 Thread Bill Cole
On 2024-02-19 at 07:37:03 UTC-0500 (Mon, 19 Feb 2024 12:37:03 + 
(UTC))

Pedro David Marco via users 
is rumored to have said:


Hi everybody...
Does anyone know of a plugin for content modification?


Such a thing is not possible in SA, because SA has no mechanism for 
arbitrary content modification. When used in a milter like MIMEDefang or 
MailMunge, SA doesn't even do any header mods itself but rather relies 
on the milter to do it.



an example, i want to change the word 'sex'   for '---'   


As others have said: substring replacement in email is an unwise tactic 
that proved its utter uselessness in the '90s.


Aside from  the fact that this would do active damage to the 
comprehensibility of some perfectly legitimate messages, it would 
invalidate any sort of authenticating signature (DKIM, PGP, S/MIME, 
whatever)


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: SpamAssassin4 + DCC not populating "X-Spam-DCC: : " header ?

2024-02-18 Thread Bill Cole

On 2024-02-18 at 14:21:41 UTC-0500 (Sun, 18 Feb 2024 14:21:41 -0500)
 
is rumored to have said:


Feb 18 11:18:06.796 [6905] dbg: dcc: local tests only, 
disabling DCC


That seems like a clear explanation: your configuration has disabled 
'net' tests. You seem to have dns_available set to 'no'




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Bayes "corpus" - how old?

2024-01-31 Thread Bill Cole

On 2024-01-31 at 08:16:13 UTC-0500 (Wed, 31 Jan 2024 14:16:13 +0100)
Matus UHLAR - fantomas 
is rumored to have said:


On 2024-01-30 at 12:08:18 UTC-0500 (Tue, 30 Jan 2024 18:08:18 +0100)
Matus UHLAR - fantomas 
is rumored to have said:

[...]
autolearn may help if your DB is well maintained, although I have 
disabled nearly all rules with negative scores, like


RCVD_IN_DNSWL_*
RCVD_IN_IADB_* DKIMWL_WL_*
RCVD_IN_MSPIKE_*
RCVD_IN_VALIDITY_*
USER_IN_DEF_*
ALL_TRUSTED

etc, because spammers often abuse these.
I mean, they may have negative score but don't train on them.


On 30.01.24 15:31, Bill Cole wrote:
If spammers can 'abuse' ALL_TRUSTED you have a major problem. Either 
a serious misconfiguration or compromised machines in 
trusted_networks.


Can't ALL_TRUSTED happen if spammer delivers mail directly to my 
network,

or, if last mail server removes Received: headers?

I think this happened to me in the past but I may be wrong


I just did a manual test on my personal machine to confirm: mail entered 
manually in a connection to port 25 from an unprivileged network with no 
Received headers did NOT get an ALL_TRUSTED match.


The semantics around the word 'trusted' in SA are subtle and arcane. 
There's an important distinction between trusting that a particular MTA 
writes transparent and honest Received headers and trusting that a 
particular MTA does not relay spam. For example, I have 2 address blocks 
in my trusted_networks that are used by the ASF for forwarding, which I 
needed precisely because those machines sometimes forward spam and I 
need SA to look beyond the immediate clients, which I know tell me the 
truth about where they get the spam they offer me.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Bayes "corpus" - how old?

2024-01-30 Thread Bill Cole

On 2024-01-30 at 12:08:18 UTC-0500 (Tue, 30 Jan 2024 18:08:18 +0100)
Matus UHLAR - fantomas 
is rumored to have said:

[...]
autolearn may help if your DB is well maintained, although I have 
disabled nearly all rules with negative scores, like


RCVD_IN_DNSWL_*
RCVD_IN_IADB_* DKIMWL_WL_*
RCVD_IN_MSPIKE_*
RCVD_IN_VALIDITY_*
USER_IN_DEF_*
ALL_TRUSTED

etc, because spammers often abuse these.
I mean, they may have negative score but don't train on them.


If spammers can 'abuse' ALL_TRUSTED you have a major problem. Either a 
serious misconfiguration or compromised machines in trusted_networks.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Bayes "corpus" - how old?

2024-01-30 Thread Bill Cole

On 2024-01-30 at 09:59:52 UTC-0500 (Tue, 30 Jan 2024 09:59:52 -0500)
joe a 
is rumored to have said:


Advisable to "prune" Bayes data based on age?


Yes. That is why it has an expiration model. Expiration may be de facto 
blocked on some busy systems so you may need to explicitly force it 
occasionally. The command "sa-learn --dump magic" will show you 
expiration and other Bayes metadata.


While cleaning up recent Ham/Spam, found my "saved SPAM" goes back to 
2013.


Why that's over . . . wait, I need to take off my socks . . .


I've still got some almost 3x as old. BUT: I do not use it for training 
SA today.



So, how old is "too old".  For saved SPAM?


I would suggest a year as the outer edge of Bayes usefulness.

I find it helpful to keep my decades of garbage because I use them (and 
my ham archive) in developing prospective rules. There are non-obvious 
fingerprints in some spam that imply decades-long spamming operations.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: install SA p a i n f u l l

2024-01-30 Thread Bill Cole

On 2024-01-29 at 23:06:07 UTC-0500 (Tue, 30 Jan 2024 14:06:07 +1000)
Nick Edwards 
is rumored to have said:


omfg
even killing it, then having to kill every individual  sub process
manually...
re run using  -f

and it still loops and times out.

 very braindead install process. looks like there is no way for
spamassassin to install, I never recall having this problem ever 
before  on

all 3.x versions, but 4.0.0 is a useless bitch,  i'm about to install
rspamd


I'm sorry to hear that you're having such problems. I don't know of any 
major changes to the install process in 4.x, so without any specific 
details I can't really offer a solution.


I can say that AS ALWAYS it is a bad idea to build and test ANY software 
as 'root' and SA does not accommodate doing so. There may well be places 
where the tests fail slowly if you run them as root. The only step you 
should perform as root is  the actual installation. Another possible 
issue arising from how some platforms (e.g. RedHat) use Perl's 
"local::lib" mechanism by default, giving each user their own bespoke 
Perl environment. You must disable that when building and testing SA.


I'm not sure how exactly one should use the cpan tool to install SA or 
any other Perl package designed as a system-wide facility. I think it is 
generally better to either use distro-provided packages or to do the 
real install from source with this arcane spell:


  perl Makefile.PL
  make build && make test
  sudo make install

On Tue, Jan 30, 2024 at 1:36 PM Nick Edwards 


wrote:


Venting

Set up a new server today, took no time in postfix dovecot and 
amavisd,

apache roundcube, and everything, then came spamassassin

thankfully I chose to install this whilst we left for lunch, but 
45mins
later to my horror it was still trying to install, why?  because its 
tests

failed for timeouts this, timeouts that,  everytime its set keeps on
retrying reporting

error: config: no rules were found!  Do you need to run 'sa-update'?
config: no rules were found!  Do you need to run 'sa-update'?

of fricken course there is no rules, its a new fricken install that 
cpan

hasn't got around to yet to allow us to run sa-update.

perhaps spamassassin developers can consider not everyone is 
upgrading,
there are some of us trying to get the fricken thing on the fricken 
machine

in the fricken first place.

I am not going to run cpan with force because that may hide *real* 
errors.






--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Dinged for .Date

2024-01-17 Thread Bill Cole

On 2024-01-16 at 18:33:23 UTC-0500 (Tue, 16 Jan 2024 17:33:23 -0600)
Noel 
is rumored to have said:

This - getting a .com domain to send mail - is really the only choice 
you have.


I have not seen major problems with *.net or *.org domains getting 
deliverability and some ccTLDs have reasonably decent reputations.


But yes, a *.com is how most people would want to go.

If Spamassassin were to whitelist your domain *today*, it will be some 
period of time until all the people running SA have the updated rules. 
I don't know how long, but I'm guessing many months. For some, years.


The long tail is long, but since we encourage all sites to get updates 
daily, the sites which lag more than a week are likely failing in many 
other ways as well. The long tail is very low. If I put a rule into my 
SA sandbox tonight, and it is good enough, it will be on most SA 
machines within 4-5 days and will be essentially everywhere worth caring 
about in 10. If Kevin makes a change in the KAM list, most of his users 
will have the rule the next day, as he does not depend on the RuleQA 
process.


SA removing .date from the lists of suspect TLDs would likely fix all 
noticeable problems the OP has related to SA within a fortnight. That 
*DOES NOT* mean their headaches from using a .date domain would end, 
because most users' mailboxes are not protected by SA directly or 
indirectly.


I also can't imagine that SA is the only software filter preventing 
you from successfully using your .date domain for mail, so fixing SA 
won't do anything for those others.


SA may have more installs than any other spam classification tool, but 
there's a broad understanding amongst the maintainers that none of the 
behemoth mailbox providers (Google, Microsoft, Yahoo/AOL/Oath, GMX, 
Apple, etc.) use SA in any way. Fastmail may, Runbox does (or did a few 
years ago,) Proton probably does, and it is pretty much universal in the 
small-scale mailbox provider/outsourcer world, to the extent that world 
still exists. And yet, we cannot compare in scale to the world that uses 
proprietary secret filters.


The alternative is playing whack-a-mole asking individual sites to 
whitelist you until the end of time.


In theory, yes. In practice, not so much. Once you get the big guys on 
board and educate direct business partners, the numberXsize of sites 
rejecting independently based on a TLD is not so big.




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: symlinking config files

2024-01-05 Thread Bill Cole

On 2024-01-05 at 13:53:00 UTC-0500 (Fri, 5 Jan 2024 18:53:00 +)
Thomas Krichel 
is rumored to have said:


  Hi gang,

  my first post here.

  I'm running version 4.0.0-8 on debian testing. This is for
  Mailman. I have a script that creates a welcomelist for all my
  Mailman members. I include it via a symlink.

# ls -l /etc/spamassassin/88_mailman_members.cf
lrwxrwxrwx 1 root root 57 Jan  5 15:52 
/etc/spamassassin/88_mailman_members.cf -> 
../../home/mailman/opt/spamassassin/88_mailman_members.cf


  Clearly spamassassin follows the symlink and reads the file. I can
  see by just making a mistake in it, mistyping welcomelist as
  wlcomelist

root@tagol~# spamassassin --lint
Jan  5 17:58:51.081 [783424] warn: config: failed to parse line in 
/etc/spamassassin/88_mailman_members.cf (line 1248): wlcomelist_from 
kric...@openlib.org

root@tagol~#

  But

# spamc -R < /tmp/test.mail

  does not see the welcomelisted user. It's only when I remove the
  syslink, and replace it with the file


Why would you think spamc ever sees any SA rules file? That would pretty 
much destroy any excuse for using spamc/spamd.


You probably only needed to restart spamd.



rm /etc/spamassassin/88_mailman_members.cf
cp /home/mailman/opt/spamassassin/88_mailman_members.cf 
/etc/spamassassin/88_mailman_members.cf


  and restart

# systemctl restart spamd


Should have tried that first...



  that

# spamc -R < /tmp/test.mail

  sees the welcomelisted user. I am puzzled by this.


--
  Written by Thomas Krichel http://openlib.org/home/krichel on his 
21399th day.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Question about forwarding email (not specifically SA, pointers greatly appreciated)

2024-01-03 Thread Bill Cole

On 2024-01-03 at 14:17:11 UTC-0500 (Wed, 3 Jan 2024 13:17:11 -0600)
Thomas Cameron via users 
is rumored to have said:

The rub is, I want all emails to presid...@example.org to be forwarded 
to presidents_real_addr...@gmail.com. Since the forward happens at 
mail.example.org, the "from" is from some other domain from 
example.org, so it fails all the tests.


Indeed: your solution is known as "SRS" (Sender Rewriting Scheme) and it 
has multiple implementations. If you forward mail, you will break SPF 
unless you fix the envelope sender so that it uses a domain  that 
permits the example.org server to send for it.


OR, you could instead deliver to a POP mailbox locally and have users 
fetch from there instead of simply forwarding mail to them. This also 
avoids a completely distinct problem of places like GMail deciding that 
your org's mail server is a spamming service because it is forwarding 
spam. If users POP their mail instead of having it forwarded via SMTP, 
that does not happen.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: MS-relayed spam

2024-01-02 Thread Bill Cole
m; dkim=none
(message not signed); arc=none (0)


Weird that this says the message wasn't signed when MS saw it...


DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=x1r862t.onmicrosoft.com; s=selector1-x1r862t-onmicrosoft-com;
h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
bh=cMMl8FFbE2iyyDXVN5kGmj7djfYu1Ef14DADjnKqLVc=;
b=R1X4dpKSgryTH6OLmMzRy/tDWLnQEV8mHOEEtjH+lXKLhUWP1IcSU7ti48ZJoXOksGz7A4+ZbSb5s1wNp2A4dGS+psXMeDNERbCeNVeGFRy/0AfJX4BSO52imrh48OaXFvTjmcrwSondZQkeC2plLlatu2jWPXn+a48T+gCuUZtFOpy6+1OlQqtOhQd5Ork4w7yD6nIicaXcQ4GhpDX1YM6zU02EUOSl+pxEgJj5/WuHvXNbtuTmdsGid1JhRnmIyvR15jGzXHkyrD/KYHw3evZSOV8pJ8EMpUPDEiwdHjDGYt38j/Wwiho5yVfR/zNZa5wELOq9bYgLK0G91JywQA==


When there's a signature. Which your SA says was good.

X-MS-Exchange-Authentication-Results: spf=none (sender IP is 
193.176.158.140)

smtp.helo=mail.acquiretm.com; dkim=none (message not signed)
header.d=none;dmarc=none action=none 
header.from=x1r862t.onmicrosoft.com;


Again, a bit weird... MS says there's no DKIM, but there is.


Date: Mon, 01 Jan 2024 20:19:49 +0100
Importance: high


Never a good sign...


Subject: Your iCloud Storage Is Full. Receive 50 GB for FREE
X-TOI-MSGID: <1660898088.4bdab4ab9e89d.1704136789...@acquiretm.com>
In-Reply-To: 
<952htcjgcsdxt5hydix5kfocgsan34o2gphcyv...@egw.x1r862t.onmicrosoft.com>

Content-Type: text/html; charset="UTF-8"
CC: myem...@mydomain.com
To: myem...@mydomain.com


An idiosyncrasy...


MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: Storage Notice 


So, how many legit correspondents have email addresses matching 
/info_[a-zA-Z]{11}@[a-z0-9]{7}.onmicrosoft.com/ and send your users 
email?



Message-ID:
<0e3b3785-6682-4c22-b6d7-87286c342...@cy4pepfee34.namprd05.prod.outlook.com>
X-EOPAttributedMessage: 0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: CY4PEPFEE34:EE_|CO6PR20MB3698:EE_


I'm guessing that CY4PEPFEE34 is an important identifier here.
Maybe CO6PR20MB3698 as well.
I'm not sure exactly what any of these X-* headers mean, as I'm not MS, 
but if they are correlating nonces, I'm fine with that.


[...]

X-OriginatorOrg: x1r862t.onmicrosoft.com


Ask yourself: Do I want email from non-paying Microsoft email customers?

If the answer is "NO!" then that header provides a hint at a strong 
rule.


X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Jan 2024 
19:23:21.7479

(UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
3b787f74-e97d-4744-853e-08dc0aff1ea0

X-MS-Exchange-CrossTenant-Id: aae3bce2-b5e6-4c64-9336-2909094ee8c9
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
TenantId=aae3bce2-b5e6-4c64-9336-2909094ee8c9;Ip=[193.176.158.140];Helo=[mail.acquiretm.com]


How many variations on a customer ID does MS need?


X-MS-Exchange-CrossTenant-AuthSource:
CY4PEPFEE34.namprd05.prod.outlook.com


Definitely: CY4PEPFEE34 is some sort of Tenant/Source identifier. 
Maybe helpful...



X-MS-Exchange-CrossTenant-AuthAs: Anonymous


Really? I wonder how often that happens? I'm always interested in 
anonymous auth (either 'auth')



X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CO6PR20MB3698


And there's that correlating nonce again...

I don't know if any of those thoughts will give ideas for good actual 
rules for you (or anyone) but they are what comes to mi9nd when I look 
at those headers.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Spreadsheet::Excel ?

2023-12-29 Thread Bill Cole

On 2023-12-29 at 08:41:23 UTC-0500 (Fri, 29 Dec 2023 08:41:23 -0500)
Alex 
is rumored to have said:


Hi,

Barracuda recently announced they've identified a vulnerability in the
Spreadsheet::Excel library used by amavis in their appliances. I 
didn't

realize they were still using amavis and open source (and presumably
spamassassin?).
https://www.barracuda.com/company/legal/esg-vulnerability


Barracuda has never been entirely open about their components, but they 
started as a very typical Postfix/Amavis/SpamAssassin/ClamAV rig.


I don't have this library on my system - is there a plugin that 
enables

parsing of Excel spreadsheets for malicious code?


The OLEVBMacro plugin exists. It does not use Spreadsheet::Excel. Malice 
is out of scope, but since mailing around MS files with macros has never 
been a good idea, discriminating between malice or sheer blinding 
stupidity is non-critical.


In my experience it has been workable to just reject mail with .xls and 
.xlsx attachments by default at any Internet-facing MX. 20+ years of 
warnings about how reckless it is to share MS documents ought to suffice 
for anyone.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Bayes always reject.

2023-12-13 Thread Bill Cole

On 2023-12-13 at 01:49:24 UTC-0500 (Wed, 13 Dec 2023 07:49:24 +0100)
Pierluigi Frullani 
is rumored to have said:


Hello all,
 I'm facing a strange problem.


Not really. MANY people run into this issue...

I've feed the bayes db for a while and now I would like to put it in 
use

but all messages get a BAYES_99 and very high spam point.
I would like to understand why, and troubleshoot this problem but I 
can't

find a way.


The only reasons that can happen are:

1. All of your mail is in fact spam.
2. Your Bayes DB is mis-trained.

The fix (assuming #2) is to recreate the Bayes DB with proper training.

*IN THEORY* one could fix a corrupted DB by 'unlearning' messages which 
learned incorrectly, but as a practical matter that's usually a fantasy.


Most of the scanning and DB details that you included are not useful. 
You cannot fix the bad DB, you need to rebuild it.




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: long delay with the new rules from 8 dec

2023-12-08 Thread Bill Cole

On 2023-12-08 at 05:43:28 UTC-0500 (Fri, 8 Dec 2023 11:43:28 +0100)
Mickaël Maillot 
is rumored to have said:


forget what i say, it was a DNS issue unrelated to the updated rules.


An example of the Basic Axiom of System Administration:

 It is *ALWAYS* DNS.





Le ven. 8 déc. 2023 à 11:00, Mickaël Maillot 
 a

écrit :


Hi,

I just want to notify you that the new rules take lots more times,
i updated my rules from 5/12 to 8/12 and now in my maillog, i see a 
lot's

of:
tests_pri_-100: 21005
tests_pri_-100: 14165
tests_pri_-100: 17684
tests_pri_-100: 23094

reverted the ruleset back to 5/12 and it's back to 200 ~ 5000 ms.

Note: I also have some personal rules.

Am I the only one seeing this?




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: proper use of internal_networks?

2023-12-07 Thread Bill Cole




"Dan Mahoney (Gushi)"  writes:


Hey there all,

Recently, we noticed that one of our system's "cron" mails started
getting caught by our spam filter (because it had lots of hostnames in
it about failed ssh logins, which the uribl plugin didn't like).

This system is listed (v4 and v6) in trusted_networks -- and it sends
it straight to our MX host via v6.  (no SMTP auth)


trusted_networks are NOT machines that you trust to not send spam. They 
are machines on other people's networks which you trust not to forge 
Received headers and which only talk to you directly. Like a secondary 
MX. You don't really trust the networks or the machines or their users, 
but only their Received headers.


internal_networks should include machines which you expect to never send 
you spam and whose mail you want to see no matter how spammy it looks: 
machines (and users) that you are expected to fix so that they don't 
send spam. These are also expected to write trustworthy Received 
headers.




We're getting a warning about "unparseable relay", but I think that's
just the DMA [freebsd's default mailer] throwing it off:

Received: from dmahoney (uid 10302)
   (envelope-from dmaho...@bommel.dayjob.org)
   id 237584
   by bommel.dayjob.org (DragonFly Mail Agent v0.13 on 
bommel.dayjob.org);

   Thu, 07 Dec 2023 19:45:29 +


That is an extremely disappointing Received header. Is there another one 
indicating an SMTP handoff or is this a strictly local-local delivery?



I also noticed that the all_trusted rule did not fire -- perhaps,
again, because of the above unparseable relay.


Correct.

It would take some work to make that generally useful to SA except as an 
idiosyncratic indicator of local submission.



Is DMA putting a crappy header in that would cause this not to break
if we were running a local postfix/sendmail?


Probably, but I'm not 100% clear on how this mail is travelling. Can you 
clarify and/or provide more complete headers?



Maybe I'm unclear on how this all works, but I thought that putting a
host in trusted_networks basically sidestepped spam processing.


No. It ADDS the potential for processing that ingests the Received 
header written by the "trusted" machine


There are ALL_TRUSTED and NO_RELAYS rules which have weak negative (ham) 
scores and I believe ALL_TRUSTED is shortcircuited by default in the 
distribution rules. I've written and tested a slight variant 
ALL_INTERNAL rule, but I saw no compelling reason to have it in the 
default rules.


SpamAssassin itself does not have any "don't look at this" switch. If 
you want your own mail to be exempted, you need to match some pattern in 
the mail that is unlikely to be spoofable. That can require any or all 
of setting the *_networks correctly, making glue layer includes a 
parseable Received header, and making your DNS and machine naming nicely 
harmonious.



What's the "correct" way to do this?  These are boxes that do not
normally relay mail -- they only generate it from system reports and
cron jobs, and generally speaking, only to us.


Just don't send those messages to SpamAssassin. How you do that depends 
on your MTA and your choice of glue. I do this on my own tiny network in 
MIMEDefang, where I have bespoke non-portable code very specifically 
identifying my automated mail generators and not checking them with SA.


If you can't readily do that, make sure your mail generators don't 
create sloppy Received headers, the glue layer gives you a usable local 
Received header (a real issue with milters,) that you have proper 
*_networks settings, and all your machines call themselves by resolvable 
names which are actually theirs. With that sort of hygiene, you can then 
tweak the scores of ALL_TRUSTED and/or NO_RELAYS or craft special rules 
that give you some large negative score and probably shortcircuit those. 
It may also be enough to make all of those automated mail generators 
send to addresses which use one of these mechanisms (from  'perldoc 
Mail::SpamAssassin::Conf')


There are three levels of To-welcomelisting, "welcomelist_to",
"more_spam_to" and "all_spam_to". Users in the first level may 
still
get some spammish mails blocked, but users in "all_spam_to" 
should

never get mail blocked.

Those are all implemented through rules which you can adjust scores for 
and/or shortcircuit just to be sure.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: sa-learn on an Exchange public folder

2023-12-04 Thread Bill Cole

On 2023-12-03 at 14:58:36 UTC-0500 (Sun, 3 Dec 2023 20:58:36 +0100)
Emmanuel Seyman 
is rumored to have said:


Hello all.

I've set up SA at $WORK and now want to train the bayesian classifier.
To that end, a public folder has been setup on our Exchange server and
I want to run sa-learn on any email that is transferred to it.

I'm guessing this is a popular thing to do and that there would 
already

be a wrapper around sa-learn on github but my Google-foo seems to be
off today.

Is there such a wrapper or do I have to write my own script?



I am aware of no such script. The overwhelming majority of sites using 
SA use operating systems other than Windows and mail servers using open 
format standards like mbox and Maildir. Last I knew, Exchange folders 
were binary blobs in a format (PST?) that MS either does not document or 
documents poorly, but that could be a decade or more out of date.


SpamAssassin understands the standard format of Internet mail messages 
as defined in RFC822 and its successors. It also understands a few 
simple ways that RFC822 messages are packaged together (mbox, mbx, 
bsmtp) but Exchange only uses that format for sending mail over the 
Internet, while it uses its own proprietary formats internally.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Catch a rejected message ?

2023-12-01 Thread Bill Cole

On 2023-12-01 at 10:29:24 UTC-0500 (Fri, 1 Dec 2023 15:29:24 +)
White, Daniel E. (GSFC-770.0)[AEGIS] via users 
is rumored to have said:


We are using SpamAssassin 3.4.6-1 with Postfix 3.5.8-4 on RHEL 8

We are seeing occasional blocked messages that say “milter-reject” 
with a spam score of 8


Is there a way to capture the offending messages to figure out the 
problem ?


Only if there is functionality for that in the milter itself (i.e. the 
'glue' between Postfix and SA) that allows you to do so. SpamAssassin 
has no facility to save messages.


For example, MIMEDefang and its cousin MailMunge both use a unique 
working directory for each message, and it is trivial to just replicate 
that whole structure elsewhere for safekeeping.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: ATT RBL f---wits

2023-11-27 Thread Bill Cole

On 2023-11-27 at 16:31:52 UTC-0500 (Mon, 27 Nov 2023 14:31:52 -0700)
Philip Prindeville 
is rumored to have said:


We're being blacklisted by att.net with the following message:

   (reason: 550 5.7.1 Connections not accepted from servers without a 
valid sender domain.flph840 Fix reverse DNS for 24.116.100.90)


I don't know what the hell is up with these pinheads:

philipp@ubuntu22:~$ dig -tmx redfish-solutions.com. @8.8.8.8

; <<>> DiG 9.18.12-0ubuntu0.22.04.3-Ubuntu <<>> -tmx 
redfish-solutions.com. @8.8.8.8

;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58379
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;redfish-solutions.com. IN MX

;; ANSWER SECTION:
redfish-solutions.com. 21600 IN MX 10 mail.redfish-solutions.com.

;; Query time: 48 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)
;; WHEN: Sun Nov 19 15:08:29 MST 2023
;; MSG SIZE  rcvd: 71

philipp@ubuntu22:~$ dig -ta mail.redfish-solutions.com. @8.8.8.8

; <<>> DiG 9.18.12-0ubuntu0.22.04.3-Ubuntu <<>> -ta 
mail.redfish-solutions.com. @8.8.8.8

;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19570
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;mail.redfish-solutions.com. IN A

;; ANSWER SECTION:
mail.redfish-solutions.com. 21600 IN A 24.116.100.90

;; Query time: 72 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)
;; WHEN: Sun Nov 19 15:08:39 MST 2023
;; MSG SIZE  rcvd: 71

philipp@ubuntu22:~$ dig -x 24.116.100.90 @8.8.8.8

; <<>> DiG 9.18.12-0ubuntu0.22.04.3-Ubuntu <<>> -x 24.116.100.90 
@8.8.8.8

;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2371
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;90.100.116.24.in-addr.arpa. IN PTR

;; ANSWER SECTION:
90.100.116.24.in-addr.arpa. 21600 IN PTR mail.redfish-solutions.com.

;; Query time: 68 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)
;; WHEN: Sun Nov 19 15:08:55 MST 2023
;; MSG SIZE  rcvd: 95

philipp@ubuntu22:~$

So that's not the problem.  You're supposed to be able to get the 
blacklisting fixed if you email abuse_...@abuse-att.net 
<mailto:abuse_...@abuse-att.net> but I've emailed them from 3 
different addresses and have yet to get a response much less a 
resolution.


Has anyone else had to deal with this bullocks and gotten it resolved?



Yes. Twice.

Time is your friend. AT&T still operates like it's 1970...



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: spamc -L does not return 5, or 6

2023-11-08 Thread Bill Cole
On 2023-11-07 at 18:23:19 UTC-0500 (Wed, 8 Nov 2023 00:23:19 +0100)
 
is rumored to have said:

> On 11/7/23 18:38, Cecil Westerhof wrote:
>> Matus UHLAR - fantomas  writes:
[...]
>>
>> They are imaps -> imap over ssh.
>> But that is not the problem. Spamc does what it should be doing,
>> except that it gives back 0 instead of 5 or 6.
>>
> It seems to be a documentation bug, see 
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=6069 and 
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=1201#c47
>

Documentation fixed in r1913677

-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


signature.asc
Description: OpenPGP digital signature


Re: Getting error 74

2023-11-01 Thread Bill Cole

On 2023-11-01 at 07:50:38 UTC-0400 (Wed, 01 Nov 2023 12:50:38 +0100)
Cecil Westerhof 
is rumored to have said:


Since some time I see that when I want to update the spamassassin
filters I get error 74 for every email that I use to train the
filters. What could be happening here?


We really would need a lot more context to answer that.

In SOME contexts, '74' is defined as EX_IOERR. That would indicate a 
problem with the underlying storage (OR network connection, in some 
cases) used for your Bayes database.


What database are you using for Bayes?

What tool are you using to learn messages?

What platform are you running on? (OS, distro, perl version, etc.)

What version of SpamAssassin are you using?

I see that you asked about this same issue(?) on this mailing list in 
October 2018 but I do not see any resolution from that time...




--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: spamd: still running as root

2023-10-30 Thread Bill Cole

On 2023-10-30 at 12:45:31 UTC-0400 (Mon, 30 Oct 2023 16:45:31 +)
Linkcheck via users 
is rumored to have said:

I have just updated Debian to Bookworm in order to install SA 4. Very 
few problems so far but the postfix log is giving:


"spamd: still running as root: user not specified with -u, not found, 
or set to root, falling back to nobody"


I am not sure where to specify an appropriate user (and possibly how 
and what). Help, please?


If you do not understand very specifically WHY you NEED spamd to run as 
some specific other user, DO NOT DO IT. If your only reason for asking 
this is the log entry, just forget about it.


'man spamd' provides more info.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: external API request

2023-10-27 Thread Bill Cole

On 2023-10-27 at 10:56:36 UTC-0400 (Fri, 27 Oct 2023 14:56:36 +)
DEMBLANS Mathieu 
is rumored to have said:


Hi,
Anyone know if there is a way to request an external API throught a 
spamsassassin plugin ?


There is no existing SA plugin which implements an interface to any 
generic web API (such as REST endpoints) but there's no reason one could 
not write a plugin to access such external APIs. Spamhaus has done this, 
for example. Also see SpamAssassin::Plugin::URIDNSBL, which implements a 
process for using the DNSBL mechanism with URIs, as is used by multiple 
blocklist providers.


It will be to search an URL extracted by SA from a body of a mail and 
check if it's referenced with an API request on an external service 
(virustotal or other).


Look at how the various URIBL* rules and SpamAssassin::Plugin::URIDNSBL 
work.



We receive some mails with URL inside whose page contains malware.
One day, a user will click on it...
If I can junk it before, it would be great.


The current URIBL* rules may be helpful, if you are able to use them. If 
you use other people's open DNS resolvers, that can take a small amount 
of work, to stand up your own autonomous caching resolver.


If you need a web API backend rather than a DNSBL, you would need to 
write that plugin specifically for that backend.


We do accept feature requests in the Bugzilla, but at this point we do 
not have a list of developers waiting to take on the feature request 
list, so if you really need it, you'd need to create it yourself.





--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Missing Mail::SpamAssassin::Plugin::WelcomeListSubject

2023-10-26 Thread Bill Cole

On 2023-10-26 at 10:14:44 UTC-0400 (Thu, 26 Oct 2023 15:14:44 +0100)
Linkcheck via users 
is rumored to have said:

I have just had reason to run --lint (first time in a week) and it 
failed drastically. This is on an well-established postfix mail server 
(but currently no real users) running 3.4.6 on Perl version 5.32.1 on 
Debian Bullseye. Result of --lint is...


Oct 26 14:39:02.888 [121778] warn: plugin: failed to parse plugin 
(from @INC): Can't locate 
Mail/SpamAssassin/Plugin/WelcomeListSubject.pm in @INC (you may need 
to install the Mail::SpamAssassin::Plugin::WelcomeListSubject module) 
(@INC contains: /usr/share/perl5 /etc/perl 
/usr/local/lib/x86_64-linux-gnu/perl/5.32.1 
/usr/local/share/perl/5.32.1 /usr/lib/x86_64-linux-gnu/perl5/5.32 
/usr/lib/x86_64-linux-gnu/perl-base 
/usr/lib/x86_64-linux-gnu/perl/5.32 /usr/share/perl/5.32 
/usr/local/lib/site_perl) at (eval 109) line 1.


Your SA installation is broken.

WelcomeListSubject is a new module in v4, replacing WhiteListSubject. If 
you have anything referencing it in a 3.4.6 installation, you have 
something very wrong. The easiest fix is likely to be to remove and 
re-install SA.



with two added comments due to plugin not found.

Reload just perormed gives...


[ ... SNIP ... ]

Oct 26 14:38:53 bristolmail spamd[121772]: config: failed to parse 
line, skipping, in "/etc/spamassassin/w7_whitelist.cf": 
whitelist_subject Barstaple House


Whatever that file is, it is NOT part of the SA distribution. Consult 
the author of 'w7_whitelist.cf' for support of whatever configuration it 
includes.




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: def_welcomelist_auth versus def_whitelist_auth in 60_welcomelist_auth.cf

2023-10-12 Thread Bill Cole

On 2023-10-12 at 12:09:48 UTC-0400 (Thu, 12 Oct 2023 12:09:48 -0400)
George A. Theall via users 
is rumored to have said:


In looking at the recent change to 60_welcomelist_auth.cf, I noticed
that the file has two sets of address patterns - one in
def_welcomelist_auth and the other in def_whitelist_auth - and that
they're not the same.  Should they be?


Yes.



~# perl -n -e 'print "$1\n" if (/^def_welcomelist_auth\s*(.+)$/);' 
/var/lib/spamassassin/3.004006/updates_spamassassin_org/60_welcomelist_auth.cf 
> /tmp/welcomelist
~# perl -n -e 'print "$1\n" if (/^def_whitelist_auth\s*(.+)$/);' 
/var/lib/spamassassin/3.004006/updates_spamassassin_org/60_welcomelist_auth.cf 
> /tmp/whitelist

~# diff /tmp/welcomelist /tmp/whitelist
56d55
< *@*.wellframe.com
728a728

*@*.bark.com



Well, that needs fixing then...


# svn diff
Index: 60_welcomelist_auth.cf
===
--- 60_welcomelist_auth.cf  (revision 1912921)
+++ 60_welcomelist_auth.cf  (working copy)
@@ -794,6 +794,7 @@
 def_welcomelist_auth *@*.redditgifts.com
 def_welcomelist_auth *@*.tdworld.com
 def_welcomelist_auth *@*.thenorthface.com
+def_welcomelist_auth *@*.bark.com
 def_welcomelist_auth *@*.center.io
 def_welcomelist_auth *@*.movethisworld.com
 def_welcomelist_auth *@*.pgsurveying.com
@@ -1098,6 +1099,7 @@
 #   authentic emails
 #
 def_whitelist_auth *@*.indeed.com
+def_whitelist_auth *@*.wellframe.com
 def_whitelist_auth *@*.hyatt.com
 def_whitelist_auth *@*.sears.com
 def_whitelist_auth *@*.jcpenney.com

# svn commit -m "trued up welcome/white discrepancy"
Authentication realm: <https://svn.apache.org:443> ASF Committers
Password for 'billcole': ***

Sending60_welcomelist_auth.cf
Transmitting file data .done
Committing transaction...
Committed revision 1912923.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Getting phishing from sender in 60_welcomelist_auth.cf

2023-10-12 Thread Bill Cole

On 2023-10-12 at 10:24:11 UTC-0400 (Thu, 12 Oct 2023 10:24:11 -0400)
Ricky Boone 
is rumored to have said:


Thank you.  It was my mistake initially, as I was under the impression
that submitting unsolicited samples wasn't preferred, and was just
intending to raise awareness for others in case they see anything
similar.


Often one of us who has access to robust mail streams can find adequate 
evidence on our own. In this case the volume seems to have been rather 
low.




Attached is evidence with redactions.  Again, my apologies if the
original email came across as it may have, and also for the delay in
reporting (I was alerted to this yesterday afternoon).


No problem. Your analysis of the issue as a compromised SendGrid account 
appears to be right, which breaks the basis for having them in the 
default welcomelist.


Change committed:

# svn diff -r r1910021:r1912921 60_welcomelist_auth.cf
Index: 60_welcomelist_auth.cf
===
--- 60_welcomelist_auth.cf  (revision 1910021)
+++ 60_welcomelist_auth.cf  (revision 1912921)
@@ -546,7 +546,6 @@
 def_welcomelist_auth *@*.directgeneral.com
 def_welcomelist_auth *@*.subaru.com
 def_welcomelist_auth *@*.aexp.com
-def_welcomelist_auth *@*.usssa.com
 def_welcomelist_auth *@*.bestwesternrewards.com
 def_welcomelist_auth *@*.email-weightwatchers.com
 def_welcomelist_auth *@*.email-allstate.com
@@ -1523,7 +1522,6 @@
 def_whitelist_auth *@*.directgeneral.com
 def_whitelist_auth *@*.subaru.com
 def_whitelist_auth *@*.aexp.com
-def_whitelist_auth *@*.usssa.com
 def_whitelist_auth *@*.bestwesternrewards.com
 def_whitelist_auth *@*.email-weightwatchers.com
 def_whitelist_auth *@*.email-allstate.com





On Thu, Oct 12, 2023 at 8:48 AM Bill Cole
 wrote:


On 2023-10-11 at 22:02:22 UTC-0400 (Wed, 11 Oct 2023 22:02:22 -0400)
Ricky Boone 
is rumored to have said:


My apologies.

The samples that I have contain email addresses that I am not at
liberty to share without redacting.  If it's okay that there are
certain strings that are removed, I should be able to make them
available.  Is there a preferred method for getting this to you?


Attached to a message here or to a bug report in the SA project
Bugzilla: https://bz.apache.org/SpamAssassin/

Ideally, just redact the local part of user addresses. Nothing else 
is
really sensitive in spam, and facts like domains and IP addresses 
help

validate spam analysis. For example, we wouldn't want to de-list a
domain which appears to be forged into spam.

The point of having a minimally-redacted message as an openly visible
example for removing a def_welcomelist entry is to make sure that we
aren't open to being used for mischief and can justify the removal 
later

if asked to. The bar for removal is very low (being listed is a
privilege, not a right) but it can't be simply 'someone said...'





On Wed, Oct 11, 2023 at 9:25 PM Bill Cole
 wrote:


On 2023-10-11 at 16:45:15 UTC-0400 (Wed, 11 Oct 2023 16:45:15 
-0400)

Ricky Boone 
is rumored to have said:

Just a heads up, it appears that usssa[.]com has had their 
SendGrid

email sending account popped, and a bad actor has been sending
phishing emails from it.  The domain is defined in
60_welcomelist_auth.cf with 
def_welcomelist_auth/def_whitelist_auth

entries with *@*.usssa.com.


If anyone has a shareable sample spam to substantiate this, that
would
be helpful.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Getting phishing from sender in 60_welcomelist_auth.cf

2023-10-12 Thread Bill Cole

On 2023-10-11 at 22:02:22 UTC-0400 (Wed, 11 Oct 2023 22:02:22 -0400)
Ricky Boone 
is rumored to have said:


My apologies.

The samples that I have contain email addresses that I am not at
liberty to share without redacting.  If it's okay that there are
certain strings that are removed, I should be able to make them
available.  Is there a preferred method for getting this to you?


Attached to a message here or to a bug report in the SA project 
Bugzilla: https://bz.apache.org/SpamAssassin/


Ideally, just redact the local part of user addresses. Nothing else is 
really sensitive in spam, and facts like domains and IP addresses help 
validate spam analysis. For example, we wouldn't want to de-list a 
domain which appears to be forged into spam.


The point of having a minimally-redacted message as an openly visible 
example for removing a def_welcomelist entry is to make sure that we 
aren't open to being used for mischief and can justify the removal later 
if asked to. The bar for removal is very low (being listed is a 
privilege, not a right) but it can't be simply 'someone said...'






On Wed, Oct 11, 2023 at 9:25 PM Bill Cole
 wrote:


On 2023-10-11 at 16:45:15 UTC-0400 (Wed, 11 Oct 2023 16:45:15 -0400)
Ricky Boone 
is rumored to have said:


Just a heads up, it appears that usssa[.]com has had their SendGrid
email sending account popped, and a bad actor has been sending
phishing emails from it.  The domain is defined in
60_welcomelist_auth.cf with def_welcomelist_auth/def_whitelist_auth
entries with *@*.usssa.com.


If anyone has a shareable sample spam to substantiate this, that 
would

be helpful.

--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Getting phishing from sender in 60_welcomelist_auth.cf

2023-10-11 Thread Bill Cole

On 2023-10-11 at 16:45:15 UTC-0400 (Wed, 11 Oct 2023 16:45:15 -0400)
Ricky Boone 
is rumored to have said:


Just a heads up, it appears that usssa[.]com has had their SendGrid
email sending account popped, and a bad actor has been sending
phishing emails from it.  The domain is defined in
60_welcomelist_auth.cf with def_welcomelist_auth/def_whitelist_auth
entries with *@*.usssa.com.


If anyone has a shareable sample spam to substantiate this, that would 
be helpful.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Pre-processor for spamassassin

2023-10-08 Thread Bill Cole

On 2023-10-08 at 03:38:00 UTC-0400 (Sun, 8 Oct 2023 18:38:00 +1100)
Erik de Castro Lopo 
is rumored to have said:


Hi,

I am in the process of writing a pre-processor for Spamassassin. It 
would

be a pre-processor because I do not read or write Perl.


That would be a solid reason not to attempt a SA 'plugin' but you really 
should be considering whether the analysis you are trying to do can be 
implemented as local custom rules. SA rules do not require knowledge of 
Perl.


The I idea would be to analyse the each email and based on the 
analysis add
extra fields to the email header before passing the email to 
spamassassin

to do its thing.

My first questions is, will SA detect these new headers and use them 
as part

of its analysis?


SA has access to the entire message. SA rules can examine any header, 
but SA won't do anything more than treat an arbitrary header as a series 
of meaningless tokens for Bayesian classification unless you add rules 
that specifically interpret those headers.



Assuming the above is true, I have a couple of options:

  1) Always add each new header with a score (which I do not think 
would

 be very effective).


SA does not look for 'scores' in headers, as it has no way to know what 
a score might look like or mean and it can't *generally* trust anything 
inside a message  as meaning what it claims to mean, e.g. you can't just 
send mail with a "X-Spam-Score: -200" and expect SA to treat that as a 
score. For SA to interpret the content of a header as justifying a 
score, it needs rules that interpret it.



  2) Only add a new header if the detected feature is probably spam.


If you are absolutely set on the design concept of putting something in 
front of SA, that's probably better. OR, if this is intended mainly to 
protect non-spam, only tag that. In any case, you'll need custom SA 
rules to understand any sort of meaning in what you add.


Depending on the specific sort of analysis you are doing, it may be 
feasible to do it with a construct of SA rules, and that would avoid the 
housekeeping issues of how to integrate a 'preprocessor' with your 
existing MTA and whatever yopu're using as 'glue' for SA. 
(content_filter script, spamass-milter, MIMEDefang, etc.)



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Filtering emails from word-oliv...@somewhere.com

2023-10-05 Thread Bill Cole

On 2023-10-05 at 03:41:59 UTC-0400 (Thu, 05 Oct 2023 14:41:59 +0700)
Olivier 
is rumored to have said:


Hi,

Recently I have received a wave of mails in the form
From: word-olivier@somewhere.random
To: oliv...@mydomain.com

Where the "olivier" part is a valid username on my domain.

Is there a rule to catch these with SA?


SA does not have any way to know what the valid usernames in any domain 
are. Without custom local rules, it doesn't even know what domains might 
be valid for your mail system. You can, of course, create local rules 
for specific users who get heavily targeted by this tactic. That does 
not scale, but it can be useful.


Special rules for high-spam individuals can also help by acting as 
"canary" rules, if you use the 'autolearn_force' rule tflag. This way, 
when a spammer using the specific pattern starts a run, you will catch 
one match, autolearn it as spam, and (hopefully) recognize its sibling 
messages as such.





--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


  1   2   3   4   5   6   7   8   9   10   >