Re: blocking compute-1.amazonaws.com

2024-10-11 Thread John Hardin

On Fri, 11 Oct 2024, Marc wrote:


We can just block hostnames that resolve to compute-1.amazonaws.com not? Amazon 
has own smtp range, or am I wrong?


If you just want to block those outright, look at the features in your MX 
(sendmail, postfix, etc.) - it verly likely has features to do a reverse 
lookup of the sender's IP and whitelist/blacklist for domain names from 
that so you block the sender at SMTP time.


Don't get tunnel vision about SpamAssassin being the only tool available 
for this sort of thing... :)



--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Joseph Goebbels had a Ministry of Truth.
  Joseph Stalin had a Ministry of Truth.
  Joseph Biden has a Ministry of Truth.   -- Errol Webber
---
 25 days until the Presidential Election


RE: blocking compute-1.amazonaws.com

2024-10-11 Thread Marc
> 
> Marc skrev den 2024-10-11 09:24:
> > We can just block hostnames that resolve to compute-1.amazonaws.com
> > not? Amazon has own smtp range, or am I wrong?
> 
> urls have nothing to do with sending ips

this a reverse hostname lookup



Re: blocking compute-1.amazonaws.com

2024-10-11 Thread Bill Cole

On 2024-10-11 at 03:24:00 UTC-0400 (Fri, 11 Oct 2024 07:24:00 +)
Marc 
is rumored to have said:

We can just block hostnames that resolve to compute-1.amazonaws.com 
not? Amazon has own smtp range, or am I wrong?


You are not wrong *as far as I know*

I have broad swaths of non-SES AWS network space (generally with names 
like that) blocked in various ways, and have never had a reason to make 
an exception.  At one point in the past, Spamhaus did so as well, 
although I'm not sure if that remains true.




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: blocking compute-1.amazonaws.com

2024-10-11 Thread Benny Pedersen

Marc skrev den 2024-10-11 09:24:
We can just block hostnames that resolve to compute-1.amazonaws.com 
not? Amazon has own smtp range, or am I wrong?


urls have nothing to do with sending ips



Re: Whitelist or BAYES?

2024-10-03 Thread Bowie Bailey

On 10/1/2024 8:58 AM, Bill Cole wrote:


On 2024-09-30 at 16:22:49 UTC-0400 (Mon, 30 Sep 2024 16:22:49 -0400)
joe a 
is rumored to have said:

On 9/27/2024 04:05:51, Matus UHLAR - fantomas wrote:

On 26.09.24 10:27, joe a wrote:

Maybe I should not ask this, but . . .

A relatively innocuous member informational email from a
local town Library (monthly) gets marked as spam as shown
below.
The BAYES_99 and BAYES_999 values are something I am
toying with for other reasons.  Seems odd these should hit
either one of those tests.

So, on the one hand I can add them to whitelist and be
done with it, or I can add
them to missed HAM for re-learning.

Which is the best approach?

so far, both. You may need to relearn multiple their (monthly)
mails before it has effect.

X-Spam-Report:
*  4.1 BAYES_99 BODY: Bayes spam probability is 99 to
100%
*  [score: 1.]
*  5.0 BAYES_999 BODY: Bayes spam probability is 99.9
to 100%
*  [score: 1.]

You have raised BAYES_99 and BAYES_999 to huge values so I
recommend to rethink that.

You some "don't because" examples?   Seems to me, off hand, that
if it's 99% or 99.9% then a high value does no harm.  Perhaps half
what I have would be sufficient though.

Bayes is a statistical method and so will always make some errors, as 
in this case. BY DEFINITION, one in a hundred messages hitting 
BAYES_99 will be ham, as will one in a thousand that hits BAYES_999.


I can't claim that the default scores are the best possible ones, but 
they don't result in many false positive *final scores* for most people.




Also, keep in mind that BAYES_999 is an add-on to BAYES_99.  Any message 
that hits BAYES_999 will also hit BAYES_99.  That is why the default 
score for BAYES_999 is only 0.2.


The way you have your scores set will ensure that any message that hits 
BAYES_999 will get 9.1 points added (4.1 + 5.0).  This may or may not 
work for you, but you should be aware of it.


--
Bowie

Re: Whitelist or BAYES?

2024-10-01 Thread Bill Cole

On 2024-09-30 at 16:22:49 UTC-0400 (Mon, 30 Sep 2024 16:22:49 -0400)
joe a 
is rumored to have said:


On 9/27/2024 04:05:51, Matus UHLAR - fantomas wrote:

On 26.09.24 10:27, joe a wrote:

Maybe I should not ask this, but . . .

A relatively innocuous member informational email from a local town 
Library (monthly) gets marked as spam as shown below.
The BAYES_99 and BAYES_999 values are something I am toying with for 
other reasons.  Seems odd these should hit either one of those 
tests.


So, on the one hand I can add them to whitelist and be done with it, 
or I can add

them to missed HAM for re-learning.

Which is the best approach?


so far, both. You may need to relearn multiple their (monthly) mails 
before it has effect.



X-Spam-Report:
*  4.1 BAYES_99 BODY: Bayes spam probability is 99 to 100%
*  [score: 1.]
*  5.0 BAYES_999 BODY: Bayes spam probability is 99.9 to 
100%

*  [score: 1.]


You have raised BAYES_99 and BAYES_999 to huge values so I recommend 
to rethink that.


You some "don't because" examples?   Seems to me, off hand, that if 
it's 99% or 99.9% then a high value does no harm.  Perhaps half what 
I have would be sufficient though.


Bayes is a statistical method and so will always make some errors, as in 
this case. BY DEFINITION, one in a hundred messages hitting BAYES_99 
will be ham, as will one in a thousand that hits BAYES_999.


I can't claim that the default scores are the best possible ones, but 
they don't result in many false positive *final scores* for most people.




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Whitelist or BAYES?

2024-09-30 Thread joe a

On 9/30/2024 16:22:49, joe a wrote:

On 9/27/2024 04:05:51, Matus UHLAR - fantomas wrote:

On 26.09.24 10:27, joe a wrote:

Maybe I should not ask this, but . . .

A relatively innocuous member informational email from a local town 
Library (monthly) gets marked as spam as shown below.
The BAYES_99 and BAYES_999 values are something I am toying with for 
other reasons.  Seems odd these should hit either one of those tests.


So, on the one hand I can add them to whitelist and be done with it, 
or I can add

them to missed HAM for re-learning.

Which is the best approach?


so far, both. You may need to relearn multiple their (monthly) mails 
before it has effect.



X-Spam-Report:
*  4.1 BAYES_99 BODY: Bayes spam probability is 99 to 100%
*  [score: 1.]
*  5.0 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
*  [score: 1.]


You have raised BAYES_99 and BAYES_999 to huge values so I recommend 
to rethink that.


You some "don't because" examples?   Seems to me, off hand, that if 
it's 99% or 99.9% then a high value does no harm.  Perhaps half what I 
have would be sufficient though.


* -0.1 DKIM_VALID Message has at least one valid DKIM or DK 
signature

* -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
*  author's domain


you can safely welcomelist_from_dkim their mail address.

Can you expand on that a bit?  Did not know there was such an item.  
Is it obvious in the documentation?


I did find it clearly documented, eventually, but need to state 
whitelist rather than welcomelist not being at version 4.






Re: Whitelist or BAYES?

2024-09-30 Thread joe a

On 9/27/2024 04:05:51, Matus UHLAR - fantomas wrote:

On 26.09.24 10:27, joe a wrote:

Maybe I should not ask this, but . . .

A relatively innocuous member informational email from a local town 
Library (monthly) gets marked as spam as shown below.
The BAYES_99 and BAYES_999 values are something I am toying with for 
other reasons.  Seems odd these should hit either one of those tests.


So, on the one hand I can add them to whitelist and be done with it, 
or I can add

them to missed HAM for re-learning.

Which is the best approach?


so far, both. You may need to relearn multiple their (monthly) mails 
before it has effect.



X-Spam-Report:
*  4.1 BAYES_99 BODY: Bayes spam probability is 99 to 100%
*  [score: 1.]
*  5.0 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
*  [score: 1.]


You have raised BAYES_99 and BAYES_999 to huge values so I recommend 
to rethink that.


You some "don't because" examples?   Seems to me, off hand, that if it's 
99% or 99.9% then a high value does no harm.  Perhaps half what I have 
would be sufficient though.


* -0.1 DKIM_VALID Message has at least one valid DKIM or DK 
signature

* -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
*  author's domain


you can safely welcomelist_from_dkim their mail address.

Can you expand on that a bit?  Did not know there was such an item.  Is 
it obvious in the documentation?





Re: ATTENTION: DNSWL to be disabled by default.

2024-09-29 Thread Łukasz Michalski
I expect a good number of „unconsious overusers“ behind the large 
resolvers (eg a typical Spamassassin admin with misguided DNS setup), 
but there are likely also „conscious overusers“ trying to blend into 
that group. The number of organisations can hardly be estimated with 
meaningful accuracy.


I was an unconscious over-user for some time.

The problem is that there is no info/hint anywhere that you hit a limit. 
I was not aware about any limits at all. I started to analyze what is 
wrong after some spam started to kick in. Then it give me a lot of time 
to figure out what is going on. From admin point of view DNSWL in 
spamassassin started to behave randomly. Sometimes I get "BLOCKED" 
response, sometimes 127.0.10.3 (listed HI as "some special cases" 
category).  Back in 2021 google search did not provide me any useful 
hints what was wrong, or I was not able to enter correct keywords for 
this problem. I eventually fixed my config and even tried to help others [1]


It should not work that way.

For my case, the solution should be:

An option DNSWL_ENABLED="off/on/auto" in spamassassin config (auto is 
default).


"auto" mode:
enable DNSWL if host resolver is set to 127.0.0.1 or eventually to DNS 
with IP in the same domain as a host where spamassassin runs. Every 
other DNS configuration should disable DNSWL at startup and then log an 
ERROR in log file with a link to documentation.


"on" will force spamassassin to use DNSWL no matter how DNS is set up. 
You have to set it manually, so it eliminates all unconsious overusers 
like me.


Regards,
Łukasz

[1] 
https://www.mail-archive.com/users@spamassassin.apache.org/msg108935.html




Re: ATTENTION: DNSWL to be disabled by default.

2024-09-28 Thread Matthias Leisi
(Answering on the SA Dev list, but Cc: to SA users since this list was also 
involved. I’d appreciate follow-ups on the SA dev list - Reply-To: set.)

> I can suggest that we run a statistical experiment by turning all non-.255 
> responses into .255 responses and then compare the rate of queries.

Things to keep in mind about the following data:

* The query sources and the query content are disassociated as the first step 
in gathering the data to ensure privacy. So we do not really know *who* is 
querying *what*.
* As a consequence, we can observe the „who is querying what“ only by looking 
at the data of a particular mirror for the list.dnswl.org 
 zone at the moment the data is gathered until the log 
aggregation kicks in, but not later and not aggregated.
* Since we can only observe DNS traffic, and given the caching (especially with 
the relatively long TTLs used in this zone), this is only a proxy variable for 
actual mail traffic. Due to caching we overestimate the low usage and 
underestimate the high usage patterns (assuming that they profit more from 
caching).
* We throw away some log data to limit resource use, so the data we have in our 
database generally slightly underrepresents the actual numbers.


Some statistics on overall usage (all numnbers rounded to avoid the impression 
of overly exact numbers):

* 332’000 sources querying list.dnswl.org  zone in the 
past 30 days
* of those, 13’100 sources have been doing more than 30 * 100’000 queries (ie, 
"consistent overusers“, and not just those who have a spike once in a while)
* 273 * 10^9 queries over the past 30 days overall
* Of these, ca 75% of the queries (200 * 10^9) have been issued by the 13’100 
„consistent overusers“

A lot of overusers are using more than one source IP (and some like Google use 
*a lot* of IPs, both IPv4 and IPv6). A lot of the IPs completely lack PTR 
records, or are using them inconsistently. However we can roughly group the 
overuse:

* Large resolvers, both public and hoster-provided, namely Google, OpenDNS, 
Proxad, Cloudflare, OVH and similar.
* Individual organisations where it looks unlikely that the data is used for 
filtering purposes (outbound servers from Sendgrid with millions of queries per 
day?!) 
* Commercial vendors of e-mail (filter) services

We can guesstimate that the 13’100 sources equal to about 1’000 to 3’000 
overusing organisations in the second and third group. I’d call them „conscious 
overusers“, since they should have an understanding of what they are doing 
(however given the lack of action against any of the block results, the „should 
have“ in the previous sentence is a bold statement).

I expect a good number of „unconsious overusers“ behind the large resolvers (eg 
a typical Spamassassin admin with misguided DNS setup), but there are likely 
also „conscious overusers“ trying to blend into that group. The number of 
organisations can hardly be estimated with meaningful accuracy.

We have ca 1’900 IP (ranges) with some form of block (we call this the „mirror 
ACL“):

aclaction   count
refuse  5 
returnhi430
parentblock 1417

If we only look at those which have „hits“ within the past 30 days:

aclaction   count
returnhi229
parentblock 180

„refuse“ is the _BLOCKED result; „returnhi“ the 127.0.10.3, „parentblock“ is 
hiding the NS for list.dnswl.org  (which would 
typically result in a SERVFAIL or NXDOMAIN for the NS records). There are also 
some exceptions which are not shown here (they are rare, and seem not to be 
actively used any more).

Since we only store postiive results (ie those that did result in some form of 
response from our DNS mirrors) and not the results themselves, we can not tell 
the percentage of responses in refused / returnhi / parentblock (and a 
successful parentblock would not even make it into the logs).

All returnhi / parentblock have now been reverted to refuse. It will take 
several hours for this to be fully propagated (export / sync delay, and 
especially TTLs). We also attempted to identify some of the categories large 
resolvers / individual abusers and to add them to the „refuse“ acl action in 
order to have a more consistent experience.

We will let it run for about a week with all aclactions on „refuse“, and review 
the data. Since there is quite some natural fluctuation in the logs (throughout 
the days, over the week, and seasonally), it may need more than one week to get 
meaningful data.

—Matthias, for the dnswl.org  project






Re: whitelist_from not honored ?

2024-09-27 Thread Benny Pedersen

Xavier Humbert skrev den 2024-09-27 13:20:


To: r...@groumpf.org
From: root 

I don't understand while it is not whitelisted.


is internal_networks + trusted_networks set in local.cf ?

perldoc Mail::SpamAssassin::Conf

if its local mail root@ root@ you should see ALL_TRUSTED hits

avoid whitelist_from since its allow forged results





Re: ATTENTION: DNSWL to be disabled by default.

2024-09-27 Thread Paul Stead
On Fri, 27 Sept 2024 at 19:57, Alex  wrote:

>
> FMBLA now also appears to be part of DNSWL.
>

Please note that dkimwl.org (and the associated fresh.fmb.la) are not
associated with dnswl.org, despite the similarity in name.

The dkimwl.org/fmb.la nameservers obey the free usage limits and standard
_BLOCKED returns for overuse and do not return bogus replies or ignore
queries.

That said the fmb.la nameservers seem to be responding fine from our
monitoring nodes.

Paul


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-27 Thread Alex
[abbrieviated version, as gmail rejected my first attempt]

Hi,I've been following this thread on allowable query limits and have a few
questions. While I don't see any DKIMWL_BLOCKED or other *_BLOCKED rules
hitting in my logs, I am seeing timeouts related to their sub-rules like
this:

Sep 26 12:01:09 iceman amavis[545932]: (545932-11) SA info: async: aborting
after 3.923 s, deadline shrunk: AskDNS,
A/globaltestsupply-com-fresh-fmb-la, rules: __FROM_FMBLA_NEWDOM28,
__FROM_FMBLA_NEWDOM14, __FROM_FMBLA_NDBLOCKED, __FROM_FMBLA_NEWDOM
Sep 26 11:46:09 iceman amavis[539167]: (539167-11) SA info: async: aborting
after 3.386 s, deadline shrunk: AskDNS, A/mg-expediagroupm-lookup-dkimwl.org,
rules: __DKIMWL_BLOCKED, __DKIMWL_BULKMAIL, __DKIMWL_WL_MEDHI,
__DKIMWL_FREEMAIL, __DKIMWL_WL_MED, __DKIMWL_WL_BL, DKIM_BULKMAILER_FMBLA,
DKIM_WHITELIST_FMBLA, __DKIMWL_WL_HI

FMBLA now also appears to be part of DNSWL. I also have my own spamhaus
key, and it's never reported in any emails, so not sure it would be logged
here?
I've tried to create an account on dnswl[.]org but it's never verified and
they are unresponsive. Am I doing something wrong?
I'm using my own resolver. Could this be just a regular DNS timeout issue?
When I run "dig amiblocked-dnswl-org txt @127-0-0-1 +short" from the server
above it reports "no".


Re: whitelist_from not honored ?

2024-09-27 Thread Xavier Humbert

Le 27/09/2024 13:20, Xavier Humbert a écrit :

Surprisingly, while I have in whitelist.cf this line :
    whitelist_from r...@aragorn.groupf.org


Oh ! Just reading this I saw a typo in the address.

Just waiting until tomorrow for the next report

Regards,

Xavier

--
Xavier HUMBERT - Unix/Win/MacOSX Sysadmin/Network Engineer
https://www.amdh.fr



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: Whitelist or BAYES?

2024-09-27 Thread Matus UHLAR - fantomas

On 26.09.24 10:27, joe a wrote:

Maybe I should not ask this, but . . .

A relatively innocuous member informational email from a local town Library 
(monthly) gets marked as spam as shown below.
The BAYES_99 and BAYES_999 values are something I am toying with for other 
reasons.  Seems odd these should hit either one of those tests.

So, on the one hand I can add them to whitelist and be done with it, or I can 
add
them to missed HAM for re-learning.

Which is the best approach?


so far, both. You may need to relearn multiple their (monthly) mails before 
it has effect.



X-Spam-Report:
*  4.1 BAYES_99 BODY: Bayes spam probability is 99 to 100%
*  [score: 1.]
*  5.0 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
*  [score: 1.]


You have raised BAYES_99 and BAYES_999 to huge values so I recommend to 
rethink that.



* -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
* -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
*  author's domain


you can safely welcomelist_from_dkim their mail address.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
99 percent of lawyers give the rest a bad name.


RE: Whitelist or BAYES?

2024-09-27 Thread Marc
> ---
>If guns kill people, then...
>  -- pencils miss spel words.
>  -- cars make people drive drunk.
>  -- spoons make people fat.
> ---

:) I was a bit surprised to see this here! I agree with this logic. 
I think even the US should allow people to carry rocket launchers, grenade 
launchers and explosives. Where in the 2nd Amendment does it say people are not 
allowed to carry these 'arms'?

I think they should combine this with a website where all people that they 
advocate the right to this 2nd amendment, can register what school their kids 
go too. So future a future school shooter can go there where he is welcomed.



RE: Whitelist or BAYES?

2024-09-27 Thread Marc
> 
> > So, on the one hand I can add them to whitelist and be done with it, or
> > I can add them to missed HAM for re-learning.
> >
> > Which is the best approach?
> 
> Do both.
> 

You will be always having work. The one's SPAM is the other users delight. I 
have switched to having frontend servers reject and mark spam, then users can 
un-spam messages via a personal whitelist.



Re: Whitelist or BAYES?

2024-09-26 Thread John Hardin

On Thu, 26 Sep 2024, joe a wrote:

So, on the one hand I can add them to whitelist and be done with it, or 
I can add them to missed HAM for re-learning.


Which is the best approach?


Do both.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  If guns kill people, then...
-- pencils miss spel words.
-- cars make people drive drunk.
-- spoons make people fat.
---
 3 days until the 83rd anniversary of the massacre at Babi Yar
 Disarmament enables genocide - Registration enables disarmament


Re: Whitelist or BAYES?

2024-09-26 Thread Kris Deugau

joe a wrote:

Maybe I should not ask this, but . . .

A relatively innocuous member informational email from a local town Library 
(monthly) gets marked as spam as shown below.
The BAYES_99 and BAYES_999 values are something I am toying with for other 
reasons.  Seems odd these should hit either one of those tests.

So, on the one hand I can add them to whitelist and be done with it, or I can 
add
them to missed HAM for re-learning.

Which is the best approach?


Both.  Feeding it to Bayes helps to correct its behaviour for both 
future messages from this sender and similar mail from others, and 
welcomelist_from_(whatever) ensures that future mail from this sender 
doesn't get caught.


I still use welcomelist_from_rcvd now and then for senders that are 
(still) variously fumbling - or outright skipping - SPF and DKIM.  :/


-kgd


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-26 Thread Andrew C Aitchison

On Thu, 26 Sep 2024, Matus UHLAR - fantomas wrote:


On 26.09.24 18:11, Peter wrote:
I'm not very proficient at SA rules so I won't attempt to write one for 
this, but perhaps this would help:


$ dig amiblocked.dnswl.org txt @1.1.1.1 +short
"You are blocked from using list.dnswl.org through public nameservers"
"yes"
$ dig amiblocked.dnswl.org txt @127.0.0.1 +short
"no"

It looks like the above test is definitive and works regardless of what 
other codes might be returned.


% dig amiblocked.dnswl.org txt @1.1.1.1
amiblocked.dnswl.org.   300 IN  TXT "no"

however this needs one more DNS lookup, which is the opposite of what we 
need.


If this were reliable, it could be used by system installers
to set the initial configuration to something appropriate for
the existing local DNS setup.

BTW today I get different results for open resolvers - 1.1.1.1 and 9.9.9.9 
return 127.0.6.2, 8.8.8.8 returns nothing (was 127.0.10.3 a while ago).


--
Andrew C. Aitchison  Kendal, UK
   and...@aitchison.me.uk


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-26 Thread Matus UHLAR - fantomas

Root Cause Analysis (in order):

1) DNSWL does not provide blocked codes.  That deviates from 
most DNS-query based systems.


On 24.09.24 20:43, Matthias Leisi wrote:

This is wrong.



On 26/09/24 01:20, Matus UHLAR - fantomas wrote:

I have checked with 1.1.1.1, where queries only return 127.0.10.3

It would help SA (and perhaps also DNSWL) if DNSWL would return 
127.0.0.255 in addition to 127.0.10.3


- there is already rule to suspend

header  RCVD_IN_DNSWL_BLOCKED   
eval:check_rbl_sub('dnswl-firsttrusted', '^127\.0\.\d+\.255$')

dns_block_rule RCVD_IN_DNSWL_BLOCKED list.dnswl.org


On 26.09.24 18:11, Peter wrote:
I'm not very proficient at SA rules so I won't attempt to write one 
for this, but perhaps this would help:


$ dig amiblocked.dnswl.org txt @1.1.1.1 +short
"You are blocked from using list.dnswl.org through public nameservers"
"yes"
$ dig amiblocked.dnswl.org txt @127.0.0.1 +short
"no"

It looks like the above test is definitive and works regardless of 
what other codes might be returned.


% dig amiblocked.dnswl.org txt @1.1.1.1
amiblocked.dnswl.org.   300 IN  TXT "no"

however this needs one more DNS lookup, which is the opposite of what we 
need.


BTW today I get different results for open resolvers - 1.1.1.1 and 9.9.9.9 
return 127.0.6.2, 8.8.8.8 returns nothing (was 127.0.10.3 a while ago).


many dnsbls supports BLOCKED reply, but only spamhaus supports different 
reply for open resolvers - BLOCKED_OPENDNS (127.255.255.254).


SA reacts on BLOCKED by pausing for dns_block_time (default 300) seconds.

Of course, SA can't depend on spamhaus reply with other DNSBLs, mostly 
because different blocking criteria.


...as I said, if dnswl returned BLOCKED in addition to HIGH it would help 
SA at least a bit.


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
My mind is like a steel trap - rusty and illegal in 37 states.


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-25 Thread Peter

On 26/09/24 01:20, Matus UHLAR - fantomas wrote:

Root Cause Analysis (in order):

1) DNSWL does not provide blocked codes.  That deviates from most 
DNS-query based systems.


On 24.09.24 20:43, Matthias Leisi wrote:

This is wrong.


I have checked with 1.1.1.1, where queries only return 127.0.10.3

It would help SA (and perhaps also DNSWL) if DNSWL would return 
127.0.0.255 in addition to 127.0.10.3


- there is already rule to suspend

header  RCVD_IN_DNSWL_BLOCKED   
eval:check_rbl_sub('dnswl-firsttrusted', '^127\.0\.\d+\.255$')

dns_block_rule RCVD_IN_DNSWL_BLOCKED list.dnswl.org


I'm not very proficient at SA rules so I won't attempt to write one for 
this, but perhaps this would help:


$ dig amiblocked.dnswl.org txt @1.1.1.1 +short
"You are blocked from using list.dnswl.org through public nameservers"
"yes"
$ dig amiblocked.dnswl.org txt @127.0.0.1 +short
"no"

It looks like the above test is definitive and works regardless of what 
other codes might be returned.



Peter


Re: Bayes in V4 compared to V3

2024-09-25 Thread Grega via users
Oh god I`m idiot...


I had:

score BAYES_20 0.0


So now every mail has bayes score in it (changed it to score BAYES_20 0.1)


Still puzzling why I have no extreme low or extreme high values.

Also still puzzling why out of 3 identical mails one had bayes_60 and other 2 
bayes_20.


Autolearn is off.




From: Matija Nalis 
Sent: Wednesday, 25 September 2024 18:23
To: users@spamassassin.apache.org
Subject: Re: Bayes in V4 compared to V3

On Mon, Sep 23, 2024 at 01:14:25PM +, Grega via users wrote:
> Why one has "BAYES_60" and other 2 not?
>
>   4.  Race condition (IDK I`m not coder)

What backend are you using for storing bayer data?

I'm not yet on 4.x (Debian Stable FTW), but in SA 3.x default was a
local file storage (BDB?) which used file locking, and that locking was
prone to timing out when several mails came in quick succession.

For me, switching to MySQL backend for Bayes (and AWL) fixed such issues...

--
Opinions above are GNU-copylefted.


Re: Bayes in V4 compared to V3

2024-09-25 Thread Grega via users
Hi.
Im on mysql backend.
Load is none ..


From: Matija Nalis 
Sent: Wednesday, September 25, 2024 18:24
To: users@spamassassin.apache.org
Subject: Re: Bayes in V4 compared to V3

On Mon, Sep 23, 2024 at 01:14:25PM +, Grega via users wrote:
> Why one has "BAYES_60" and other 2 not?
>
>   4.  Race condition (IDK I`m not coder)

What backend are you using for storing bayer data?

I'm not yet on 4.x (Debian Stable FTW), but in SA 3.x default was a
local file storage (BDB?) which used file locking, and that locking was
prone to timing out when several mails came in quick succession.

For me, switching to MySQL backend for Bayes (and AWL) fixed such issues...

--
Opinions above are GNU-copylefted.


Re: Bayes in V4 compared to V3

2024-09-25 Thread Matija Nalis
On Tue, Sep 24, 2024 at 08:10:38AM +, Grega via users wrote:
> Also this:
> 
> RuleDescriptionScoreTotalHamCol6SpamCol8
> BAYES_40Bayes spam probability is 20 to 40%0.002,784
> 2,72197.7632.3
> BAYES_50Bayes spam probability is 40 to 60%0.8012693   
>  73.83326.2
> BAYES_60Bayes spam probability is 60 to 80%1.50437127  
>   29.131070.9
> BAYES_80Bayes spam probability is 80 to 95%7.002661
> 0.426599.6
> 
> I only have BAYES_40 to BAYES_80 after clearing bayes DB and manually 
> RE-learning on 2500 HAM and 2500 SPAM messages.
> So NO BAYES lower than 40 or higher than 80...
> 
> There is 100% something wrong here, bayes in not decision maker at all, for 
> me it is useless. This indecisiveness along with fact that some mails arent 
> even BAYES scored makes me think there is a bug or I implemented it wrong?

Perhaps running via "spamassassin -D -t" on a message would show why?
I'm suspecting the bayes is poisoned...

that might be due to spammer activity (esp. if you have
bayes_auto_learn enabled), or due to needless headers being classified,
for example.

in SA 3.x (not yet on 4.x), I had to bayes_ignore_header lots of
stuff to get it to perform resonably, esp. Received headers, or those
would often overwhelm other more useful tokens (like body matches)

-- 
Opinions above are GNU-copylefted.


Re: Bayes in V4 compared to V3

2024-09-25 Thread Matija Nalis
On Mon, Sep 23, 2024 at 01:14:25PM +, Grega via users wrote:
> Why one has "BAYES_60" and other 2 not?
> 
>   4.  Race condition (IDK I`m not coder)

What backend are you using for storing bayer data? 

I'm not yet on 4.x (Debian Stable FTW), but in SA 3.x default was a
local file storage (BDB?) which used file locking, and that locking was
prone to timing out when several mails came in quick succession.

For me, switching to MySQL backend for Bayes (and AWL) fixed such issues...

-- 
Opinions above are GNU-copylefted.


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-25 Thread Sidney Markowitz

Greg Troxel wrote on 26/09/24 12:13 am:


It looks to me like there may be a middle ground that works for both SA
and dnswl.


Your suggestions are in a similar direction to discussions that have 
been moved over to the dev list as they have become more relevant to 
that mailing list. Our first test will be to see the effect of just 
returning 127.0.0.255 for overuse and see how that effects things, given 
the effect of the timed suppression of queries after each 127.0.0.255 is 
received.




Re: ATTENTION: DNSWL to be disabled by default.

2024-09-25 Thread Matus UHLAR - fantomas

Root Cause Analysis (in order):

1) DNSWL does not provide blocked codes.  That deviates from most DNS-query 
based systems.


On 24.09.24 20:43, Matthias Leisi wrote:

This is wrong.


I have checked with 1.1.1.1, where queries only return 127.0.10.3

It would help SA (and perhaps also DNSWL) if DNSWL would return 127.0.0.255 
in addition to 127.0.10.3


- there is already rule to suspend

header  RCVD_IN_DNSWL_BLOCKED   eval:check_rbl_sub('dnswl-firsttrusted', 
'^127\.0\.\d+\.255$')
dns_block_rule RCVD_IN_DNSWL_BLOCKED list.dnswl.org


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Fighting for peace is like fucking for virginity...


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-25 Thread Greg Troxel
Sidney Markowitz  writes:

> The third is the correct return code to indicate blocking for
> overuse. When SpamAssassin sees that, it hits a rule whose description
> is supposed to inform whoever looks at the mail headers or spam report
> that the ISP has misconfigured their SpamAssassin server's DNS or is
> at a usage level that requires a paid subscription to dnswl. In
> addition, to reduce the excessive load on dnswl, when SpamAssassin
> sees that code it temporarily disables further queries to dnswl, I
> think for the life of the running process. So we do what we can to
> make the 127.0.0.255 code have a useful effect.

It looks to me like there may be a middle ground that works for both SA
and dnswl.

  the primary method for overuse is to return 127.0.0.255 (not sure
  that's true vs SERVFAIL, but seems it could be)

  SA is coded to stop querying on receipt of blocked (already true)

  dnswl will only return a fake hi response if

- persistent overquerying

- the overquerying is not consistent with SA's behavior of stopping
  queries on receipt of 127.0.0.255

- at most 1 in N, for N somewhere between 2 and 10


This should result in

  - any SA installations stopping querying

  - send the spam-getting-through signal to abusers

  - false signals to SA installations limited to situations where they
are sharing a nameserver with an abuser, and then only rarely, as
a process should stop on receipt of a single 127.0.0.255


Another thought is for SA to have way to send a dns query, perhaps to
ASF infrastructure, perhaps to someplace else, to figure out if it's
using shared DNS, and to refrain from dns queries entirely if so.
Then we can take "SA is sharing a resolver with abusers" off the table.



Re: ATTENTION: DNSWL to be disabled by default.

2024-09-25 Thread Jared Hall via users


On 9/24/2024 2:43 PM, Matthias Leisi wrote:



Root Cause Analysis (in order):

1) DNSWL does not provide blocked codes.  That deviates from most 
DNS-query based systems.


This is wrong.

— Matthias



Yes, I am wrong.  I *presumed* certain
operational characteristics.

The 127.0.10.3 response explains everything.

Thanks for chiming in.  I've always been amazed at
this SA User Group.


-- Jared Hall




Re: ATTENTION: DNSWL to be disabled by default.

2024-09-24 Thread Sidney Markowitz

Matthias Leisi wrote on 25/09/24 5:38 pm:
[...snip...]> I can suggest that we run a statistical experiment

I'm moving this to the dev list in my full reply, as now we are getting 
more into things more suited to that mailing list.


 Sidney




Re: ATTENTION: DNSWL to be disabled by default.

2024-09-24 Thread Matthias Leisi


> 
> The situation is that dnswl has four possible responses when it acts on a 
> query that it has flagged as exceeding the limits of unpaid use: 1) reject 
> with SERVFAIL, 2) reject with BLOCKED, 3) return 127.0.0.255 which is code 
> for blocked, 4) return 127.0.10.3 which is code for "other service" HI 
> non-spam

127.0.0.255 is what is used in most cases (and was the only response we 
initially implemented). 

We had the following cases where this did not result in the number of queries 
being reduced (and please note that the source is not only SA):

* Large scale open resolvers. Additionally they keep adding/changing the IPs 
which they use to talk to auth NS, and on top they often lack rDNS, making it 
difficult to identify. Some of the clueless admins fall into this category.

* Large hoster resolvers (namely AWS, OVH, Hetzner, …): Sometimes we observe 
massive amounts of „enumerating queries“, clearly some outfit scanning large IP 
ranges over DNS. Likely not SA doing this, but some standard SA installations 
are likely „mixed in“.

* A number of security companies (likely) where the pattern indicates that they 
are „freeriding“ as part of the service to their customers (some well known 
names…). Some have changed and are hiding behind AWS nowadays (see first bullet 
point).

Now if one gets blocked results (.255) from a blocklist, the effect is usually 
visible right away. With a welcomelist, the effect may be more subtle (more 
false positives, but you may not immediately know why).

Getting persistent overusers to act thus needs a different response:

* Do not give them NS response for the list.dnswl.org response. 

* Return a SERVFAIL on the list.dnswl.org zone.

* Return a _HI response.

Neither of those is ideal, neither of them works in all situations, but doing 
nothing is also not an option in terms of resource usage.

> That policy clashes with the SpamAssassin PMC's explicit policy of not 
> supporting in our default configuration any dnsbl that responds to violation 
> of a "free for some" model by returning wrong information instead of a 
> specified BLOCKED code. If we allow that, it

The implemented actions are intended to keep up a „free for most“ model.

> Under some of those circumstances, the complaints go to SpamAssassin and the 
> pressure is put on us.

I‘m aware of it and have responded a number of times on the SA users list („use 
a local resolver which you should anyway and you‘re find unless you really go 
way above 100k queries per day“).

> process after the first 127.0.0.255 is received you would feel better about 
> relying on that. Or maybe you can think of another compromise suggestion.

I‘m very open to suggestions for a better process / better actions with fewer 
collateral damage.

I can suggest that we run a statistical experiment by turning all non-.255 
responses into .255 responses and then compare the rate of queries. I‘m 
currently on business travel (and typing this mail on my phone 😅) so I could 
implement that on the weekend, and then give it a week or two to compare query 
loads (and identify some of the more obnoxious commercial abusers mentioned 
above).


— Matthias



Re: ATTENTION: DNSWL to be disabled by default.

2024-09-24 Thread Sidney Markowitz
Most of the messages on this thread, other than from bcole, have not been from 
members of the SpamAssassin PMC. I want to clarify our position and correct 
some details. I also want to see if dialog with you, Matthias, can lead to a 
better solution.

The situation is that dnswl has four possible responses when it acts on a query 
that it has flagged as exceeding the limits of unpaid use: 1) reject with 
SERVFAIL, 2) reject with BLOCKED, 3) return 127.0.0.255 which is code for 
blocked, 4) return 127.0.10.3 which is code for "other service" HI non-spam

The first two cannot be distinguished from various network or server errors, so 
SpamAssassin just drops the query with no result.

The third is the correct return code to indicate blocking for overuse. When 
SpamAssassin sees that, it hits a rule whose description is supposed to inform 
whoever looks at the mail headers or spam report that the ISP has misconfigured 
their SpamAssassin server's DNS or is at a usage level that requires a paid 
subscription to dnswl. In addition, to reduce the excessive load on dnswl, when 
SpamAssassin sees that code it temporarily disables further queries to dnswl, I 
think for the life of the running process. So we do what we can to make the 
127.0.0.255 code have a useful effect.

The fourth return code 127.0.10.3 seems to be returned less frequently. Unlike 
what someone suggested, SpamAssassin cannot treat it the same as 127.0.0.255 
because there is at least one IP address in the dnswl database that has that 
code to indicate it is a high trust site, i.e., the code is used to indicate 
high trust.  It appears that dnswl.org policy is to return that code 
sporadically to what they consider "abusers" who are people running 
SpamAssassin or other services using dnswl who do not correct their 
configuration to use a local nameserver, purchase a subscription, or stop using 
dnswl, when they continue to get query failures and 127.0.0.255 codes. That 
results in some possible spam being flagged as dnswl high trust, which is a way 
of forcing the issue to someone's attention.

That policy clashes with the SpamAssassin PMC's explicit policy of not 
supporting in our default configuration any dnsbl that responds to violation of 
a "free for some" model by returning wrong information instead of a specified 
BLOCKED code. If we allow that, it would result in end-users receiving spam 
that has been labeled as ham, with resulting complaints going to us because it 
would be SpamAssassin labeling that spam as ham.

The policy from dnswl.org does work to put pressure to reduce the abuse. 
Unfortunately it is not as simple as the pressure being put only on the 
"abusers". It could be an ISP who hasn't bothered to buy a subscription to pay 
for their high use and has end users who complain about spam they receive as a 
result. Or it could be some organization with relatively low email volume but a 
clueless IT person who cheap out by using a public nameserver like 1.1.1.1 or 
8.8.8.8, and nobody paying attention to the mail headers. Or a hosting service 
that makes SpamAssassin available to VPS customers installed via cpanel and 
nobody involved has a real clue what is going on.

Under some of those circumstances, the complaints go to SpamAssassin and the 
pressure is put on us.

We have fewer choices than an ISP. We can't pressure whoever runs the instance 
of SpamAssassin any more than you are already doing. We can't change the 
nameserver configuration or purchase a subscription on behalf of that ISP. We 
can stop using dnswl in our default configuration. If that is your preference, 
Matthias, then we are settled.

However, if the continued use of dnswl in the default SpamAssassin 
configuration is worth more to you than the benefits of putting that additional 
pressure on abusers (and clueless people who end up looking like abusers), 
maybe we can figure out something that works a little better than what we have 
been doing. Maybe we can come up with somewhat stronger language in the 
DNSWL_BLOCKED rule description. Maybe now that I told you that we stop 
subsequent queries in the process after the first 127.0.0.255 is received you 
would feel better about relying on that. Or maybe you can think of another 
compromise suggestion.

I can say that if you stop returning false HI results instead of BLOCKED we 
should always be willing to restore dnswl to our default configuration.

Regards,
Sidney Markowitz
Chair, Apache SpamAssassin PMC




Sep 25, 2024 06:44:29 Matthias Leisi :

> 
>> Root Cause Analysis (in order):
>> 
>> 1) DNSWL does not provide blocked codes.  That deviates from most DNS-query 
>> based systems.
> 
> This is wrong.
> 
> — Matthias
> 


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-24 Thread Greg Troxel
"Jared Hall via users"  writes:

> Here's the actual use case:
>
> 1) Stefan's a web guy.  He hosts his stuff at ScalaHosting.
> 2) ScalaHosting provides a one-click install of SpamAssassin.
> 3) Stefan doesn't know what DNS that SpamAssassin instance (think like
> a CloudWays App, or Digital Ocean droplet) is using.   It could be a
> public DNS; could be ScalaHosting's DNS.

The big point to me is that if ScalaHosting is selling "SpamAssassin as
a Service" then they are responsible for doing it correctly which
includes DNS routing to comply with reasonable query limits.

Basically, if you set it up yourself, you have to run your own resolver,
and if someone does it for you, it's on them to set it up right.

I see the point of unhappiness with incorrect data being returned, but
I'm ok with that as long as it's not automated and a last resort for IP
addresses that don't answer complaints via whois channels.



Re: ATTENTION: DNSWL to be disabled by default.

2024-09-24 Thread Tom Hendrikx




On 24-09-2024 16:10, Matus UHLAR - fantomas wrote:
TL;DR: Rather than using an in-band signal of a special reply value 
to queries from blocked users, as do other DNS-Based List operators, 
DNSWL.org sends back a "listed high" response to all queries. I was 
unaware


On 2024-09-24 at 04:18:06 UTC-0400 (Tue, 24 Sep 2024 10:18:06 +0200) 
Matthias Leisi  is rumored to have said:
Not to all queries. It is sent to resolvers who consistently go above 
the limits, sometimes for months and years after receiving the 
blocked response.


On 24.09.24 09:13, Bill Cole wrote:
I don't see how that's significant. The documented policy is directly 
and intentionally harmful to users.


I understand this case as "abusers" instead of users.

Doing that is a legitimate choice by a reputation service, but it's 
not one SA can endorse. The fact that it is enforced by whim rather 
than mechanically is not a positive factor.


Is there any possibility to detect clients using open DNS, perhaps other 
than RCVD_IN_ZEN_BLOCKED_OPENDNS ?


Then, block all dnsbl/rhsbl rules?



Adding to ideas:

it might be helpful to have a way to trigger messages to syslog from a 
rule. Filling syslog with messages about blocked queries might be a 
better incentive/attention-grabber for ignorant/uninformed sysadmins to 
resolve DNS related issues than a non-scoring hit in the message headers.


Kind regards,
Tom


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-24 Thread Tom Hendrikx




On 24-09-2024 20:43, Matthias Leisi wrote:



Root Cause Analysis (in order):

1) DNSWL does not provide blocked codes.  That deviates from most 
DNS-query based systems.


This is wrong.



I agree. This DNSWL website clearly defines a list of specific response 
codes, otherwise spamassassin would not be able to differ between 
lo/med/hi trust levels.


The special case response code "listed, hi" is no different from the 
special response code tied t x_BLOCKED rules that other RBL providers 
provide. Maybe Matthias can acknowledge that the code is not used for 
any other purpose than the one we're talking about, i.e. signalling 
severe abusive behavior?


The DNSWL approach may be non-standard, and their policy may be a bit 
hazardous for people not paying attention at all, but as we say in mail 
filter country: my server, my rules. DNSWL has their own set of rules. 
if you want to use the service, RFTM.


Adding some changes in 'rules/25_dnswl.cf' to support this special case 
seem trivial, and helps SA users to not shoot themselves in the foot:


Update 1 line:

header  RCVD_IN_DNSWL_HIeval:check_rbl_sub('dnswl-firsttrusted', 
'^127\.0\.\[0-9]\.3$')


Add a new rule:

header  RCVD_IN_DNSWL_BLOCKED_SEVERE_ABUSE 
eval:check_rbl_sub('dnswl-firsttrusted', '^127\.0\.\10\.3$')
describe RCVD_IN_DNSWL_BLOCKED_SEVERE_ABUSE  ADMINISTRATOR NOTICE: The 
query to DNSWL was blocked due to severe abuse.  See 
http://wiki.apache.org/spamassassin/DnsBlocklists\#dnsbl-block for more 
information.


The new rule can have an informative score, as SA can't do anything 
about the situation. This will however remove the mechanism DNSWL is 
trying to apply, but in the bug I don't see any discussion on that 
stance. Maybe apply -2 in stead of -5 for this special case?


Or did I overlook something?

PS Not posting to the dev- list as I'm not subscribed there.

Kind regards,
Tom


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-24 Thread Matthias Leisi

> Root Cause Analysis (in order):
> 
> 1) DNSWL does not provide blocked codes.  That deviates from most DNS-query 
> based systems.

This is wrong.

— Matthias



Re: ATTENTION: DNSWL to be disabled by default.

2024-09-24 Thread Anne P. Mitchell, Esq.


> 
> Maybe disable VALIDITY rule as well... They also have 10k limit in 30 days 
> window ..
> 
> My understanding is that Validity returns a specific value (127.255.255.255) 
> for blocked queries. 

I kept going back and forth as to whether to jump in on this thread and point 
out that our own positive reputation DNSRL, the GSL - or as many of you know 
it, and as it appears in the rules, the IADB - has always been and will always 
be free to query or xfer, and with no restrictions, because we consider the 
receiving community to be with whom we have our allegiance, and to whom we owe 
responsibility.  After all, the founder (me) came out of MAPS, and I have 
always adhered to (and made sure that ISIPP SuretyMail adheres to) the 
strictest of standards before a sender can be certified with us and have their 
IPs placed on the GSL.

We are incredibly proud of and grateful for our relationship with the SA 
community.  In fact, the model of using discrete IP-address-based data points 
(which we pioneered and we knew others would copy, (which they did) and we were 
fine with that because it was a benefit to the receiving community which, after 
all, is the point) was designed *specifically* with SA in mind, so that SA 
could take full advantage of the granularity of the data;  this was designed by 
me and Craig Hughes.

Having run this by a trusted advisor in this community, I was encouraged to go 
ahead and post in this thread, so now I have.

Again, here is a clear statement:  The IADB ('GSL') is a positive reputation 
DNS-based list which is and always will be free to query, and free to transfer. 
 The only way for an IP to appear on the IADB is after strict vetting and 
making sure that the sender adheres to our own very high and strict standards.  
We also take spam complaints (the few we receive - only a handful a year) very 
seriously, and we have *zero* problem hitting a sender with a clue bat, and 
'firing' a sender if we find that they have veered towards the gray side after 
becoming certified with us.  (The fact that we charge a relatively small 
monthly sum to the senders makes firing them pretty painless. Thus it has 
always been - best practices over money *always* - we can take this stand 
because we are, always have been, and always will be, privately held, and the 
buck stops with me).

Anne

--
Anne P. Mitchell, Esq.
Email Law & Policy Attorney
Legislative Advisor
CEO Institute for Social Internet Public Policy
Author: Section 6 of the CAN-SPAM Act of 2003 (the Federal email marketing law)
Author: The Email Deliverability Handbook
Board of Directors, Denver Internet Exchange
Dean Emeritus, Cyberlaw & Cybersecurity, Lincoln Law School
Prof. Emeritus, Lincoln Law School
Chair Emeritus, Asilomar Microcomputer Workshop
Counsel Emeritus, eMail Abuse Prevention System (MAPS)






Re: ATTENTION: DNSWL to be disabled by default.

2024-09-24 Thread Bill Cole
On 2024-09-24 at 12:59:51 UTC-0400 (Tue, 24 Sep 2024 12:59:51 -0400)
Jared Hall via users 
is rumored to have said:

> On 9/24/2024 10:10 AM, Matus UHLAR - fantomas wrote:
>>
>> I understand this case as "abusers" instead of users.
> One man's use is another man's abuse.  Limits are reached and False Negatives 
> are produced by DNSWL.
>
[analysis points elided]

> 1) Contraction in the Email services market; less "systems" expertise is 
> available.

A problem we've been battling forever. It won't be getting better any time soon.

> 2) DIY installs also "dumb-down" systems knowledge requirements.

Yes, and we can't hope to fight against that. Many systems these days are being 
packaged in ways that "just work" with minimal effort or knowledge, for the 
large majority of the user audience.

> 3) SA has a desire to provide some protection in a default installation.

Because we know that people do as little work as possible to get something 
"working" even if that's in a degraded state.

> 4) Migration to Zero-Trust environments.

Unclear relevance...

> 5) Integration of DNS into O/S (like the stub resolver problems in 
> Debian/Ubuntu) - can't just slap BIND on a machine anymore.

Sounds like a Linux problem :)

It's actually quite easy on many platforms to bring up an Unbound recursive 
resolver for local resolution. I think it was actually presented as a choice 
for the latest Alma(EL9) and FreeBSD machines I loaded from install media. Yes, 
the systemd resolver is garbage.

> I am 100% FOR dropping DNSWL, any way it is done, although I don't have any 
> problem with the existing handling of BLOCKED responses from Validity, 
> SpamHaus, and others.  It *seems to me* that DNSWL-type services are better 
> used as overrides at SMTP-time to DNSBL blocks.

People will always differ on which tools to use at which layer, but I tend to 
agree that positive reputation sources are best applied to keep mail away from 
ever hitting SA. SA is a semi-transparent black box. It makes mistakes 
*intrinsic to its design* in both directions. If you want to exempt mail from 
ever being blocked by SA, not showing it to SA is best. That's why I use a 
relatively heavyweight milter (MIMEDefang) which can choose whether or not to 
expose messages to SA.

>>> Doing that is a legitimate choice by a reputation service, but it's not one 
>>> SA can endorse. The fact that it is enforced by whim rather than 
>>> mechanically is not a positive factor.
>> Is there any possibility to detect clients using open DNS, perhaps other 
>> than RCVD_IN_ZEN_BLOCKED_OPENDNS ?
>>
>> Then, block all dnsbl/rhsbl rules?
>>
> I don't see any truly viable solution without conducting other lookups first. 
>   A possible alternative would be to configure an unrestricted open DNS 
> server that returns to the client, in response to a query, the IP address of 
> the DNS host from where the query originated.  Sort of like the old, 
> never-used, TCP Echo service.
>
> Of course, the devil is in the details.  But I like your thinking Matus :)  
> My mind is about as sharp as a cooked linguine noodle. I'm sure there are a 
> lot of people out there that can conjure up better solutions.

As I said in a previous message: patches are welcomed.


-- 
Bill Cole


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-24 Thread Jared Hall via users



On 9/24/2024 10:10 AM, Matus UHLAR - fantomas wrote:


I understand this case as "abusers" instead of users.
One man's use is another man's abuse.  Limits are reached and False 
Negatives are produced by DNSWL.


Here's the actual use case:

1) Stefan's a web guy.  He hosts his stuff at ScalaHosting.
2) ScalaHosting provides a one-click install of SpamAssassin.
3) Stefan doesn't know what DNS that SpamAssassin instance (think like a 
CloudWays App, or Digital Ocean droplet) is using.   It could be a 
public DNS; could be ScalaHosting's DNS.
4) ScalaHosting does offer their own mail servers for use for a fee.  
Now maybe that has functional DNS, we don't know, but Marketing/Sales 
being what it is, it is probably not a good sign for Stefan.
5) Stefan doesn't know all these particulars (#1 above).  He just knows 
it doesn't work.


Root Cause Analysis (in order):

1) DNSWL does not provide blocked codes.  That deviates from most 
DNS-query based systems.

2) ScalaHosting provides a "buggy" semi-functional package to their clients.
3) SA, being just a messenger, accurately reports the bogus False 
Negative from DNSWL.


Risk Analysis:

1) Anybody that's seen DNSWL's zone files knows that it is a useless 
arbiter of spamminess; not to mention the stale data therein.
2) SA default scores are tuned to 5, all instantly wiped out by a False 
Negative score of -5.
3) "One-Click" install packages are becoming more and more common with 
Cloud providers.


Market Influences:

1) Contraction in the Email services market; less "systems" expertise is 
available.

2) DIY installs also "dumb-down" systems knowledge requirements.
3) SA has a desire to provide some protection in a default installation.
4) Migration to Zero-Trust environments.
5) Integration of DNS into O/S (like the stub resolver problems in 
Debian/Ubuntu) - can't just slap BIND on a machine anymore.


I am 100% FOR dropping DNSWL, any way it is done, although I don't have 
any problem with the existing handling of BLOCKED responses from 
Validity, SpamHaus, and others.  It *seems to me* that DNSWL-type 
services are better used as overrides at SMTP-time to DNSBL blocks.


Doing that is a legitimate choice by a reputation service, but it's 
not one SA can endorse. The fact that it is enforced by whim rather 
than mechanically is not a positive factor.
Is there any possibility to detect clients using open DNS, perhaps 
other than RCVD_IN_ZEN_BLOCKED_OPENDNS ?


Then, block all dnsbl/rhsbl rules?

I don't see any truly viable solution without conducting other lookups 
first.   A possible alternative would be to configure an unrestricted 
open DNS server that returns to the client, in response to a query, the 
IP address of the DNS host from where the query originated.  Sort of 
like the old, never-used, TCP Echo service.


Of course, the devil is in the details.  But I like your thinking Matus 
:)  My mind is about as sharp as a cooked linguine noodle. I'm sure 
there are a lot of people out there that can conjure up better solutions.



-- Jared Hall




Apology (was Re: ATTENTION: DNSWL to be disabled by default.)

2024-09-24 Thread Bill Cole
On 2024-09-24 at 09:13:16 UTC-0400 (Tue, 24 Sep 2024 09:13:16 -0400)
Bill Cole 
is rumored to have said:

> On 2024-09-24 at 04:18:06 UTC-0400 (Tue, 24 Sep 2024 10:18:06 +0200)
> Matthias Leisi 
> is rumored to have said:
> (Quoting me)
>>>
>>> people who don't configure it correctly, in a way that is *almost 
>>> invisible.* The lower rate limit which they established in March of this 
>>> year isn't inherently bad, it just meant that enough people were hitting 
>>> the limit that someone bothered opened a bug about it.
>>>
>>
>> There is none new rule. The limit of 100‘000 per 24 hours has been in place 
>> for years.
>
> That's an interesting assertion. The page I cited has apparently changed in 
> the past day and the previous statement of a new policy has vanished. I'm 
> happy with assuming that it was an error that you've corrected.

I WAS WRONG.

The apparent explanation for that error is that I had both of these pages 
opened and somehow conflated them.

https://knowledge.validity.com/s/articles/Accessing-Validity-reputation-data-through-DNS?language=en_US
https://www.dnswl.org/?p=120

I am sorry for suggesting that this was a change, as it was clearly entirely my 
error. I have corrected the error in the rules file comment. Sadly, sent mail 
is forever...

-- 
Bill Cole


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-24 Thread Bill Cole
On 2024-09-24 at 05:09:50 UTC-0400 (Tue, 24 Sep 2024 11:09:50 +0200)
Tom Bartel 
is rumored to have said:

> I'm not sure if the 10,000 limit is possibly in reference to the Validity
> allow list...
>
> https://knowledge.validity.com/s/articles/Accessing-Validity-reputation-data-through-DNS?language=en_US
>
> We recently added a registration gate - no fees for usage above 10,000 / 30
> days, however registration of your query IPs will give you that capability.

MEA CULPA.

I'm not sure how I managed to do it, but that is almost certainly the 
explanation of my obvious error.


>
> Tom
>
> On Tue, Sep 24, 2024 at 10:16 AM Peter Ajamian 
> wrote:
>
>> On 24/09/24 05:02, Bill Cole wrote:
>>> Note
>>> that as of 2024-03-01 (as documented at the DNSWL link above) they have
>>> reduced the free limit to 10,000 queries per 30 days. A site feeding 350
>>> messages/day to SpamAssassin will exceed that limit. That is small even
>>> for "personal" systems.
>>
>> I've hunted through the links and the DNSWL.org site and cannot find any
>> reference to 10,000 queries per 30 days.  I do find lots of references
>> to the 100,000 queries per day limit, though.  Can you point out exactly
>> where the 10,000 reference is?
>>
>>
>> Thanks,
>>
>>
>> Peter
>>
>
>
> -- 
> Phone: 303.517.9655
> Website: https://bartelphoto.com
> Instagram: https://instagram.com/bartel_photo
>
> "Life's most persistent and urgent question is, 'What are you doing for
> others?'" - Martin Luther King Jr.


-- 
Bill Cole


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-24 Thread Bill Cole
On 2024-09-24 at 10:10:24 UTC-0400 (Tue, 24 Sep 2024 16:10:24 +0200)
Matus UHLAR - fantomas 
is rumored to have said:

 TL;DR: Rather than using an in-band signal of a special reply value to 
 queries from blocked users, as do other DNS-Based List operators, 
 DNSWL.org sends back a "listed high" response to all queries. I was unaware
>
>> On 2024-09-24 at 04:18:06 UTC-0400 (Tue, 24 Sep 2024 10:18:06 +0200) 
>> Matthias Leisi  is rumored to have said:
>>> Not to all queries. It is sent to resolvers who consistently go above the 
>>> limits, sometimes for months and years after receiving the blocked response.
>
> On 24.09.24 09:13, Bill Cole wrote:
>> I don't see how that's significant. The documented policy is directly and 
>> intentionally harmful to users.
>
> I understand this case as "abusers" instead of users.

In the context of spam control tactics, I'm not ready to call people who have 
no idea (and no way to see) that they are part of abusive behavior, "abusers."

E.g. the cited bug. It was reported by someone with no control of their SA 
config, as it is handled by their "web host." Presumably they use something 
like cPanel which puts email in the hands of the platform provider rather than 
the domain owner. The provider may (or may not) have seen the BLOCKED replies 
whenever they actually occurred, but the end user only knows that now, he gets 
mail from the worst spammers marked as definitively good by DNSWL, courtesy of 
SA. That's bad for the user, for SA, for DNSWL, and for the host.

>> Doing that is a legitimate choice by a reputation service, but it's not one 
>> SA can endorse. The fact that it is enforced by whim rather than 
>> mechanically is not a positive factor.
>
> Is there any possibility to detect clients using open DNS, perhaps other than 
> RCVD_IN_ZEN_BLOCKED_OPENDNS ?
>
> Then, block all dnsbl/rhsbl rules?

That sounds like a *great* idea and I'm sure it could be implemented.

Patches welcome, always. This list's sibling at dev@s.a.o is the ideal place to 
discuss implementation detail with others. Those of us able to commit to the 
repo are always happy to add other people's code and credit it, but for the 
most part the evidence supports the conclusion that as a group we are not 
wealthy enough in free time to add features to SA.

Another approach which could be simpler is to score the *_BLOCKED rules 
strongly enough to set off alarms. I don't like that much because it is using 
damage to get attention, but at least it would lead alarmed users to a correct 
conclusion about the root cause, rather than misrepresenting a reputation 
service's actual answer.

-- 
Bill Cole


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-24 Thread Matus UHLAR - fantomas
TL;DR: Rather than using an in-band signal of a special reply 
value to queries from blocked users, as do other DNS-Based List 
operators, DNSWL.org sends back a "listed high" response to all 
queries. I was unaware


On 2024-09-24 at 04:18:06 UTC-0400 (Tue, 24 Sep 2024 10:18:06 +0200) 
Matthias Leisi  is rumored to have said:
Not to all queries. It is sent to resolvers who consistently go 
above the limits, sometimes for months and years after receiving the 
blocked response.


On 24.09.24 09:13, Bill Cole wrote:
I don't see how that's significant. The documented policy is directly 
and intentionally harmful to users.


I understand this case as "abusers" instead of users.

Doing that is a legitimate choice 
by a reputation service, but it's not one SA can endorse. The fact 
that it is enforced by whim rather than mechanically is not a positive 
factor.


Is there any possibility to detect clients using open DNS, perhaps other 
than RCVD_IN_ZEN_BLOCKED_OPENDNS ?


Then, block all dnsbl/rhsbl rules?


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux is like a teepee: no Windows, no Gates and an apache inside...


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-24 Thread Bill Cole

On 2024-09-24 at 04:18:06 UTC-0400 (Tue, 24 Sep 2024 10:18:06 +0200)
Matthias Leisi 
is rumored to have said:
(Quoting me)


people who don't configure it correctly, in a way that is *almost 
invisible.* The lower rate limit which they established in March of 
this year isn't inherently bad, it just meant that enough people were 
hitting the limit that someone bothered opened a bug about it.




There is none new rule. The limit of 100‘000 per 24 hours has been 
in place for years.


That's an interesting assertion. The page I cited has apparently changed 
in the past day and the previous statement of a new policy has vanished. 
I'm happy with assuming that it was an error that you've corrected.


However, as I said, the only significance of a particular rate limit is 
how many people are affected. The scale of the harm is not relevant, the 
problem is the intentional infliction of harm on users who likely have 
no idea what is happening.


This change in the SA rules was supposed to have been made 13 years ago. 
That's when the  decision was made, based on the 100k/day threshold. The 
only reason I felt the need to announce it was the fact that back in 
2011, the intended change did not actually happen, so people have been 
using DNSWL even while the relevant rules file stated that the rules 
were disabled by default.


Enforcement of the limit is intentionally „weak“, we only look at 
new „overusers“ every few weeks.


Irrelevant. The policy is intentionally harmful. It's weak enforcement 
could even be seen as a problem per se.


TL;DR: Rather than using an in-band signal of a special reply value 
to queries from blocked users, as do other DNS-Based List operators, 
DNSWL.org sends back a "listed high" response to all queries. I was 
unaware




Not to all queries. It is sent to resolvers who consistently go above 
the limits, sometimes for months and years after receiving the blocked 
response.


I don't see how that's significant. The documented policy is directly 
and intentionally harmful to users. Doing that is a legitimate choice by 
a reputation service, but it's not one SA can endorse. The fact that it 
is enforced by whim rather than mechanically is not a positive factor.


# DNSWL is a commercial service that requires payment for servers 
over 100K queries daily.




The subscriptions to dnswl.org easily covers the infrastructure cost, 
but not much more.


— Matthias, for the dnswl.org project


Semantic dispute. Charging a fee for a service is intrinsically and 
unavoidably commercial. I appreciate that you are not running the 
service as a means of building wealth.


Personally, I consider the existence of DNSWL to be positive for the 
email ecosystem. I believe that sites which stay within the limit can 
reduce FPs by using it. That does not change the basic fact that using 
it blindly is dangerous. Just as new system installations don't deploy a 
fully-functioning MTA to accept external mail, SA strives to NOT enable 
dangerous 3rd-party tools by default.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-24 Thread Tom Bartel
I'm not sure if the 10,000 limit is possibly in reference to the Validity
allow list...

https://knowledge.validity.com/s/articles/Accessing-Validity-reputation-data-through-DNS?language=en_US

We recently added a registration gate - no fees for usage above 10,000 / 30
days, however registration of your query IPs will give you that capability.

Tom

On Tue, Sep 24, 2024 at 10:16 AM Peter Ajamian 
wrote:

> On 24/09/24 05:02, Bill Cole wrote:
> > Note
> > that as of 2024-03-01 (as documented at the DNSWL link above) they have
> > reduced the free limit to 10,000 queries per 30 days. A site feeding 350
> > messages/day to SpamAssassin will exceed that limit. That is small even
> > for "personal" systems.
>
> I've hunted through the links and the DNSWL.org site and cannot find any
> reference to 10,000 queries per 30 days.  I do find lots of references
> to the 100,000 queries per day limit, though.  Can you point out exactly
> where the 10,000 reference is?
>
>
> Thanks,
>
>
> Peter
>


-- 
Phone: 303.517.9655
Website: https://bartelphoto.com
Instagram: https://instagram.com/bartel_photo

"Life's most persistent and urgent question is, 'What are you doing for
others?'" - Martin Luther King Jr.


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-24 Thread Matthias Leisi

> 
> people who don't configure it correctly, in a way that is *almost invisible.* 
> The lower rate limit which they established in March of this year isn't 
> inherently bad, it just meant that enough people were hitting the limit that 
> someone bothered opened a bug about it.
> 

There is none new rule. The limit of 100‘000 per 24 hours has been in place for 
years.

Enforcement of the limit is intentionally „weak“, we only look at new 
„overusers“ every few weeks.
> TL;DR: Rather than using an in-band signal of a special reply value to 
> queries from blocked users, as do other DNS-Based List operators, DNSWL.org 
> sends back a "listed high" response to all queries. I was unaware
> 

Not to all queries. It is sent to resolvers who consistently go above the 
limits, sometimes for months and years after receiving the blocked response. 
> # DNSWL is a commercial service that requires payment for servers over 100K 
> queries daily.
> 

The subscriptions to dnswl.org easily covers the infrastructure cost, but not 
much more.

— Matthias, for the dnswl.org project 



Re: ATTENTION: DNSWL to be disabled by default.

2024-09-24 Thread Peter Ajamian

On 24/09/24 05:02, Bill Cole wrote:

Note
that as of 2024-03-01 (as documented at the DNSWL link above) they have
reduced the free limit to 10,000 queries per 30 days. A site feeding 350
messages/day to SpamAssassin will exceed that limit. That is small even
for "personal" systems.


I've hunted through the links and the DNSWL.org site and cannot find any 
reference to 10,000 queries per 30 days.  I do find lots of references 
to the 100,000 queries per day limit, though.  Can you point out exactly 
where the 10,000 reference is?



Thanks,


Peter


Re: Bayes in V4 compared to V3

2024-09-24 Thread Grega via users
Also this:

RuleDescriptionScoreTotalHamCol6SpamCol8
BAYES_40Bayes spam probability is 20 to 40%0.002,784
2,72197.7632.3
BAYES_50Bayes spam probability is 40 to 60%0.8012693
73.83326.2
BAYES_60Bayes spam probability is 60 to 80%1.50437127
29.131070.9
BAYES_80Bayes spam probability is 80 to 95%7.002661
0.426599.6

I only have BAYES_40 to BAYES_80 after clearing bayes DB and manually 
RE-learning on 2500 HAM and 2500 SPAM messages.
So NO BAYES lower than 40 or higher than 80...

There is 100% something wrong here, bayes in not decision maker at all, for me 
it is useless. This indecisiveness along with fact that some mails arent even 
BAYES scored makes me think there is a bug or I implemented it wrong?




From: Grega via users 
Sent: Monday, 23 September 2024 15:14
To: users@spamassassin.apache.org
Subject: Re: Bayes in V4 compared to V3


Hi again.


In V4 there is something wrong with bayes...


I received 3 identical mails (1 external sender, 3 internal recipients) and 
scores are like this:


2 X like:

0.00ARC_SIGNED  Message has a ARC signature
-0.10   ARC_VALID   Message has a valid ARC signature
-0.40   DCC_REPUT_00_12 DCC reputation between 0 and 12 % (mostly ham)
0.10DKIM_INVALIDDKIM or DK signature exists, but is not valid
0.10DKIM_SIGNED Message has a DKIM or DK signature, not necessarily 
valid
-0.00   DMARC_PASS  DMARC pass policy
0.25GMD_PDF_HORIZ   Contains pdf 100-240 (high) x 450-800 (wide)
0.50GMD_PDF_SQUARE  Contains pdf 180-360 (high) x 180-360 (wide)
0.00HTML_MESSAGEHTML included in message
1.02MISSING_HEADERS Missing To: header
1.50PHISH_LNK_URI   Typical phishing tactic - pre filled mail in link
-0.00   RCVD_IN_DNSWL_NONE  Sender listed at https://www.dnswl.org/, no 
trust
0.00RCVD_IN_VALIDITY_CERTIFIED_BLOCKED  ADMINISTRATOR NOTICE: The query 
to Validity was blocked. See 
https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more 
information.
0.00RCVD_IN_VALIDITY_RPBL_BLOCKED   ADMINISTRATOR NOTICE: The query to 
Validity was blocked. See 
https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more 
information.
0.00RCVD_IN_VALIDITY_SAFE_BLOCKED   ADMINISTRATOR NOTICE: The query to 
Validity was blocked. See 
https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more 
information.
-0.00   SPF_HELO_PASS   SPF: HELO matches SPF record



AND 1X like:

0.00ARC_SIGNED  Message has a ARC signature
-0.10   ARC_VALID   Message has a valid ARC signature
1.50BAYES_60Bayes spam probability is 60 to 80%
-0.40   DCC_REPUT_00_12 DCC reputation between 0 and 12 % (mostly ham)
0.10DKIM_INVALIDDKIM or DK signature exists, but is not valid
0.10DKIM_SIGNED Message has a DKIM or DK signature, not necessarily 
valid
-0.00   DMARC_PASS  DMARC pass policy
0.25GMD_PDF_HORIZ   Contains pdf 100-240 (high) x 450-800 (wide)
0.50GMD_PDF_SQUARE  Contains pdf 180-360 (high) x 180-360 (wide)
0.00HTML_MESSAGEHTML included in message
1.02MISSING_HEADERS Missing To: header
1.50PHISH_LNK_URI   Typical phishing tactic - pre filled mail in link
-0.00   RCVD_IN_DNSWL_NONE  Sender listed at https://www.dnswl.org/, no 
trust
0.00RCVD_IN_VALIDITY_CERTIFIED_BLOCKED  ADMINISTRATOR NOTICE: The query 
to Validity was blocked. See 
https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more 
information.
0.00RCVD_IN_VALIDITY_RPBL_BLOCKED   ADMINISTRATOR NOTICE: The query to 
Validity was blocked. See 
https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more 
information.
0.00RCVD_IN_VALIDITY_SAFE_BLOCKED   ADMINISTRATOR NOTICE: The query to 
Validity was blocked. See 
https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more 
information.
-0.00   SPF_HELO_PASS   SPF: HELO matches SPF record



Why one has "BAYES_60" and other 2 not?


My thoughts so far:

  1.  This is not shortcircuit as only bayes is different.
  2.  Mails are identical and mailserver load is... well non-existant (1 minute 
load 0.08)
  3.  Maybe some new logic in bayes to skip some?
  4.  Race condition (IDK I`m not coder)
  5.  Bayes behaves non consistent on BOTH installs I have it on




From: John Hardin 
Sent: Friday, 13 September 2024 20:38
To: SpamAssassin-Users
Subject: Re: Bayes in V4 compared to V3

On Fri, 13 Sep 2024, Bill Cole wrote:

> Please send any replies to the list only.

...or to Harald only.


--
  John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
  jhar...@impsec.org pgpk -a jhar...@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
--

Re: ATTENTION: DNSWL to be disabled by default.

2024-09-23 Thread Bill Cole

On 2024-09-23 at 13:08:17 UTC-0400 (Mon, 23 Sep 2024 17:08:17 +)
Grega via users 
is rumored to have said:

Maybe disable VALIDITY  rule as well... They also have 10k limit in 30 
days window ..


My understanding is that Validity returns a specific value 
(127.255.255.255) for blocked queries. That makes it safe to have the 
rules enabled because you then hit the BLOCKED rule for the specific 
Validity list, which has a trivial non-zero score. That is a *visible 
and harmless* marker on almost every message which should be noticed by 
the user, who can correct the underlying configuration error.


DNSWL.org *intentionally causes harm* for people who don't configure it 
correctly, in a way that is *almost invisible.* The lower rate limit 
which they established in March of this year isn't inherently bad, it 
just meant that enough people were hitting the limit that someone 
bothered opened a bug about it.


As I noted in my lengthy comment in that bug report, we (the SA 
community, particularly committers) are not an organized workforce with 
duties and assignments, and we make changes to established 
statically-scored rules on an as-noticed and as-needed basis. This is 
partly because we are considerate of the fact that we have users who 
build on top of the mostly-stable default rules. It is also because we 
are all volunteers, with lives and jobs that generally take priority 
over making SA better.






Regards,G


From: Bill Cole 
Sent: Monday, September 23, 2024 19:03
To: SpamAssassin-Users
Subject: ATTENTION: DNSWL to be disabled by default.


Context:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8193
https://www.dnswl.org/?p=120

TL;DR: Rather than using an in-band signal of a special reply value to 
queries from blocked users, as do other DNS-Based List operators, 
DNSWL.org sends back a "listed high" response to all queries. I was 
unaware of this until bug 8193 was opened and linked to the DNSWL 
statement of that policy. As I write in a comment on that bug, no one 
should ever be using DNSBLs of any sort blindly and the onus is on the 
configuring user of SA to select them prudently as they all have 
limits.



I believe this is a problem that needs fixing, but it's a change that 
may surprise some users. Consider yourself warned...


Right now, there's a comment in 50_scores.cf (the file for 
manually-set scores) that I had not previously seen:


# DNSWL is a commercial service that requires payment for servers over 
100K queries daily.
# Unfortunately, they will return true answers for DNS servers they 
consider abusive so

# SA Admins must enable these rules manually.

And yet, the scores following that comment *enables* the rules. Note 
that as of 2024-03-01 (as documented at the DNSWL link above) they 
have reduced the free limit to 10,000 queries per 30 days. A site 
feeding 350 messages/day to SpamAssassin will exceed that limit. That 
is small even for "personal" systems.


Pending a discussion on the issue reaching some other consensus, I am 
immediately changing all those scores to zero in 50_scores.cf so that 
the rules WILL BE DISABLED by default as documented in the comment. I 
am also correcting the rate cited in that comment. This change should 
take effect in the rules distribution in the next couple of days.


Whether or not you want to use DNSWL is very much a local choice. At 
10k queries/month, MOST sites will need to either register (and likely 
pay DNSWL) or leave the rules disabled.


   b...@scconsult.com or billc...@apache.org
   (AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

   Not Currently Available For Hire



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: ATTENTION: DNSWL to be disabled by default.

2024-09-23 Thread Grega via users
Maybe disable VALIDITY  rule as well... They also have 10k limit in 30 days 
window ..

Regards,G


From: Bill Cole 
Sent: Monday, September 23, 2024 19:03
To: SpamAssassin-Users
Subject: ATTENTION: DNSWL to be disabled by default.


Context:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8193
https://www.dnswl.org/?p=120

TL;DR: Rather than using an in-band signal of a special reply value to queries 
from blocked users, as do other DNS-Based List operators, DNSWL.org sends back 
a "listed high" response to all queries. I was unaware of this until bug 8193 
was opened and linked to the DNSWL statement of that policy. As I write in a 
comment on that bug, no one should ever be using DNSBLs of any sort blindly and 
the onus is on the configuring user of SA to select them prudently as they all 
have limits.


I believe this is a problem that needs fixing, but it's a change that may 
surprise some users. Consider yourself warned...

Right now, there's a comment in 50_scores.cf (the file for manually-set scores) 
that I had not previously seen:

# DNSWL is a commercial service that requires payment for servers over 100K 
queries daily.
# Unfortunately, they will return true answers for DNS servers they consider 
abusive so
# SA Admins must enable these rules manually.

And yet, the scores following that comment *enables* the rules. Note that as of 
2024-03-01 (as documented at the DNSWL link above) they have reduced the free 
limit to 10,000 queries per 30 days. A site feeding 350 messages/day to 
SpamAssassin will exceed that limit. That is small even for "personal" systems.

Pending a discussion on the issue reaching some other consensus, I am 
immediately changing all those scores to zero in 50_scores.cf so that the rules 
WILL BE DISABLED by default as documented in the comment. I am also correcting 
the rate cited in that comment. This change should take effect in the rules 
distribution in the next couple of days.

Whether or not you want to use DNSWL is very much a local choice. At 10k 
queries/month, MOST sites will need to either register (and likely pay DNSWL) 
or leave the rules disabled.

   b...@scconsult.com or billc...@apache.org
   (AKA @grumpybozo@toad.social and many *@billmail.scconsult.com addresses)
   Not Currently Available For Hire



Re: mailspike dot net Minus 1?

2024-09-23 Thread joe a

On 9/21/2024 14:06:28, Reindl Harald (privat) wrote:



Am 21.09.24 um 18:51 schrieb joe a:

Noticed some obvious spam slipping in due in great part to this:

* -1.0 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2)
*  [209.85.166.199 listed in wl.mailspike.net]

Not a big deal for my low volume SOHO, but it's annoying.

Has that check become unreliable?  Sure, I can skip that check (I 
think) or alter the score, but any other thoughts?


what makes you think a single rule is that important?

sometimes IPs on whitelists starting to send spam, somehtimes 
spamhosts are not on a blacklist until they are - so what's the fuss 
about?


100% clear spam won't survive just because of a single -1 rule


Here is a more complete list from a very similar message, received 
today.  I failed to report the last -1.0 when I posted earlier.


X-Spam-Report:
*  1.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
*  [score: 1.]
*  3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
*  [score: 1.]
* -0.9 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2)
*  [209.85.219.198 listed in wl.mailspike.net]
*  0.2 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level
*  mail domains are different
*  0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record
*  0.7 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail)
*  1.0 FREEMAIL_FROM Sender email is commonly abused enduser mail
*  provider
*  [lurramachile[at]att.net]
*  0.0 HTML_MESSAGE BODY: HTML included in message
* -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
*  0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily
*   valid
*  0.0 FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and
*  EnvelopeFrom freemail headers are different
* -1.0 MAILING_LIST_MULTI Multiple indicators imply a widely-seen list
*   manager




Re: Disable validity rules

2024-09-23 Thread Bill Cole

On 2024-09-23 at 09:15:25 UTC-0400 (Mon, 23 Sep 2024 13:15:25 +)
Grega via users 
is rumored to have said:


Hi.


Where can one disable this?


One can disable any rule by adding a score line in local.cf for the rule 
with a score of 0, e,g,:



score  RCVD_IN_VALIDITY_CERTIFIED_BLOCKED  0



RCVD_IN_VALIDITY_CERTIFIED_BLOCKED  ADMINISTRATOR NOTICE: The 
query to Validity was blocked. See 
https://knowledge.validity.com/hc/en-us/articles/20961730681243 for 
more information.
RCVD_IN_VALIDITY_RPBL_BLOCKED   ADMINISTRATOR NOTICE: The query to 
Validity was blocked. See 
https://knowledge.validity.com/hc/en-us/articles/20961730681243 for 
more information.
RCVD_IN_VALIDITY_SAFE_BLOCKED   ADMINISTRATOR NOTICE: The query to 
Validity was blocked. See 
https://knowledge.validity.com/hc/en-us/articles/20961730681243 for 
more information.


Thanks!



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Disable validity rules

2024-09-23 Thread Grega via users
True.


I have added it and will report back in few days...


Regards,G



From: Reindl Harald (privat) 
Sent: Monday, 23 September 2024 15:31
To: Grega; users@spamassassin.apache.org
Subject: Re: Disable validity rules



Am 23.09.24 um 15:23 schrieb Grega via users:
> I have local unbound resolver without forwarding and all other lists are
> working just this one not :)

but i doubt you have this:

cache-min-ttl: 60
cache-max-negative-ttl: 60

DNSBL/DNSWL usually have a very short TTL

> I`m not the only one with this issue...
>
> 
> *From:* Reindl Harald (privat) 
> *Sent:* Monday, 23 September 2024 15:20
> *To:* Grega; users@spamassassin.apache.org
> *Subject:* Re: Disable validity rules
>
>
> Am 23.09.24 um 15:15 schrieb Grega via users:
>
>> Where can one disable this?
>>
>> RCVD_IN_VALIDITY_CERTIFIED_BLOCKED ADMINISTRATOR NOTICE: The query to
>> Validity was blocked. See
>> https://knowledge.validity.com/hc/en-us/articles/20961730681243
> <https://knowledge.validity.com/hc/en-us/articles/20961730681243> for more
>> information.
>> RCVD_IN_VALIDITY_RPBL_BLOCKED  ADMINISTRATOR NOTICE: The query to
>> Validity was blocked. See
>> https://knowledge.validity.com/hc/en-us/articles/20961730681243
> <https://knowledge.validity.com/hc/en-us/articles/20961730681243> for more
>> information.
>> RCVD_IN_VALIDITY_SAFE_BLOCKED  ADMINISTRATOR NOTICE: The query to
>> Validity was blocked. See
>> https://knowledge.validity.com/hc/en-us/articles/20961730681243
> <https://knowledge.validity.com/hc/en-us/articles/20961730681243> for more
>> information.
>>
>> Thanks!
>
> in local.cf as everything else by score it with 0
> but better seek the reason which is in most cases a shared dns resolver
>
> with a local unbound resolver WITHIUT FORWARDING and proper caching this
> should be solved in most cases
>
> cache-min-ttl: 60
> cache-max-negative-ttl: 60


Re: Disable validity rules

2024-09-23 Thread Grega via users
I have local unbound resolver without forwarding and all other lists are 
working just this one not :)

I`m not the only one with this issue...



From: Reindl Harald (privat) 
Sent: Monday, 23 September 2024 15:20
To: Grega; users@spamassassin.apache.org
Subject: Re: Disable validity rules



Am 23.09.24 um 15:15 schrieb Grega via users:

> Where can one disable this?
>
> RCVD_IN_VALIDITY_CERTIFIED_BLOCKED ADMINISTRATOR NOTICE: The query to
> Validity was blocked. See
> https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
> information.
> RCVD_IN_VALIDITY_RPBL_BLOCKED  ADMINISTRATOR NOTICE: The query to
> Validity was blocked. See
> https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
> information.
> RCVD_IN_VALIDITY_SAFE_BLOCKED  ADMINISTRATOR NOTICE: The query to
> Validity was blocked. See
> https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
> information.
>
> Thanks!

in local.cf as everything else by score it with 0
but better seek the reason which is in most cases a shared dns resolver

with a local unbound resolver WITHIUT FORWARDING and proper caching this
should be solved in most cases

cache-min-ttl: 60
cache-max-negative-ttl: 60


Re: Bayes in V4 compared to V3

2024-09-23 Thread Grega via users
Hi again.


In V4 there is something wrong with bayes...


I received 3 identical mails (1 external sender, 3 internal recipients) and 
scores are like this:


2 X like:

0.00ARC_SIGNED  Message has a ARC signature
-0.10   ARC_VALID   Message has a valid ARC signature
-0.40   DCC_REPUT_00_12 DCC reputation between 0 and 12 % (mostly ham)
0.10DKIM_INVALIDDKIM or DK signature exists, but is not valid
0.10DKIM_SIGNED Message has a DKIM or DK signature, not necessarily 
valid
-0.00   DMARC_PASS  DMARC pass policy
0.25GMD_PDF_HORIZ   Contains pdf 100-240 (high) x 450-800 (wide)
0.50GMD_PDF_SQUARE  Contains pdf 180-360 (high) x 180-360 (wide)
0.00HTML_MESSAGEHTML included in message
1.02MISSING_HEADERS Missing To: header
1.50PHISH_LNK_URI   Typical phishing tactic - pre filled mail in link
-0.00   RCVD_IN_DNSWL_NONE  Sender listed at https://www.dnswl.org/, no 
trust
0.00RCVD_IN_VALIDITY_CERTIFIED_BLOCKED  ADMINISTRATOR NOTICE: The query 
to Validity was blocked. See 
https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more 
information.
0.00RCVD_IN_VALIDITY_RPBL_BLOCKED   ADMINISTRATOR NOTICE: The query to 
Validity was blocked. See 
https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more 
information.
0.00RCVD_IN_VALIDITY_SAFE_BLOCKED   ADMINISTRATOR NOTICE: The query to 
Validity was blocked. See 
https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more 
information.
-0.00   SPF_HELO_PASS   SPF: HELO matches SPF record



AND 1X like:

0.00ARC_SIGNED  Message has a ARC signature
-0.10   ARC_VALID   Message has a valid ARC signature
1.50BAYES_60Bayes spam probability is 60 to 80%
-0.40   DCC_REPUT_00_12 DCC reputation between 0 and 12 % (mostly ham)
0.10DKIM_INVALIDDKIM or DK signature exists, but is not valid
0.10DKIM_SIGNED Message has a DKIM or DK signature, not necessarily 
valid
-0.00   DMARC_PASS  DMARC pass policy
0.25GMD_PDF_HORIZ   Contains pdf 100-240 (high) x 450-800 (wide)
0.50GMD_PDF_SQUARE  Contains pdf 180-360 (high) x 180-360 (wide)
0.00HTML_MESSAGEHTML included in message
1.02MISSING_HEADERS Missing To: header
1.50PHISH_LNK_URI   Typical phishing tactic - pre filled mail in link
-0.00   RCVD_IN_DNSWL_NONE  Sender listed at https://www.dnswl.org/, no 
trust
0.00RCVD_IN_VALIDITY_CERTIFIED_BLOCKED  ADMINISTRATOR NOTICE: The query 
to Validity was blocked. See 
https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more 
information.
0.00RCVD_IN_VALIDITY_RPBL_BLOCKED   ADMINISTRATOR NOTICE: The query to 
Validity was blocked. See 
https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more 
information.
0.00RCVD_IN_VALIDITY_SAFE_BLOCKED   ADMINISTRATOR NOTICE: The query to 
Validity was blocked. See 
https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more 
information.
-0.00   SPF_HELO_PASS   SPF: HELO matches SPF record



Why one has "BAYES_60" and other 2 not?


My thoughts so far:

  1.  This is not shortcircuit as only bayes is different.
  2.  Mails are identical and mailserver load is... well non-existant (1 minute 
load 0.08)
  3.  Maybe some new logic in bayes to skip some?
  4.  Race condition (IDK I`m not coder)
  5.  Bayes behaves non consistent on BOTH installs I have it on




From: John Hardin 
Sent: Friday, 13 September 2024 20:38
To: SpamAssassin-Users
Subject: Re: Bayes in V4 compared to V3

On Fri, 13 Sep 2024, Bill Cole wrote:

> Please send any replies to the list only.

...or to Harald only.


--
  John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
  jhar...@impsec.org pgpk -a jhar...@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
   USMC Rules of Gunfighting #20: The faster you finish the fight,
   the less shot you will get.
---
  Today: the 459th anniversary of the muslim Ottoman defeat at Malta


Re: SPAM-DETECTOR Re: Tips on training bayes?

2024-09-19 Thread natan

W dniu 18.09.2024 o 16:29, Matus UHLAR - fantomas pisze:

On 18.09.24 16:19, natan wrote:
I was very disappointed with spamassassin 4.x because it started to 
grow /var/lib/amavis/tmp/


amavis should clean this itself.
which amavis version do you have installed?
did you tune it anyhow?


amavisd-new-2.11.1 (20181009) (sic!)


Did you enable and configure extracttext plugin?
Because that one may be kinda filing it up.

Probably yes:
loadplugin Mail::SpamAssassin::Plugin::ExtractText
/etc/spamassassin/init.pre:#loadplugin 
Mail::SpamAssassin::Plugin::ExtractText


/etc/spamassassin/PDFInfo2.pm:and enable it using the 
Lhttps://spamassassin.apache.org/full/4.0.x/doc/Mail_SpamAssassin_Plugin_ExtractText.html> 
plugin.
/etc/spamassassin/PDFInfo2.pm:    must be extracted by another 
plugin such as ExtractText.pm


for SA4.x




With SA 3.4.X - on average 100MB and it deletes on the fly
With SA 4.X - on average 2-6GB and I had to do a quick fix:
59 23 * * * root find /var/lib/amavis/tmp/ -mtime +0 -delete;

W dniu 18.09.2024 o 16:09, Matus UHLAR - fantomas pisze:

On 18.09.24 13:42, Grega via users wrote:
Right now in SA 4.0.1 bayes at least for me is really challenging 
to train and set up.


I had good trained DB from past V3 install, and it behaved really odd.

I trained it on new set of mails 3000 spam and 3000 ham (HAND 
PICKED mail it was PAIN) and I cant get either BAYES_00 or BAYES_99 :)


I mean I get them occasionally, but not even close to what it was 
in V3.



In V3 SA bayes was decisive, when well trained it was awesome.

Nov in V4.0.1 bayes is NON decisive, and in 90% of cases it gives 
me BAYES_40 or _50 even after I mark those mails as SPAM OR HAM.



What is even more weird is, that some mails aren`t even bayes 
scored at all. BAYES_XX is missing from headers entirely and I


don`t know why...


I`m kind of sorry that I upgraded to 4.0.1...



looking at your first mail, it seems that you only have tokens for a 
few days:


dbg: bayes: corpus size: nspam = 1190, nham = 12441 dbg: bayes: DB 
expiry: tokens in \
DB: 979401, Expiry max size: 150, Oldest atime: 1725361640, 
Newest atime: \

1725888528, Last expire: 0, Current time: 1725888537

% date -d @1725361640
Tue Sep  3 13:07:20 CEST 2024

% date -d @1725888528
Mon Sep  9 15:28:48 CEST 2024


How do you call spamassassin, directly, via spamass-milter, amavis 
or other way?

Did you tune any bayes settings?
Do you have your trusted_networks and internal_networks set up 
properly?




--



Re: Use of uninitialized value $response[0]

2024-09-19 Thread Niamh Holding


Hello Bill,

Tuesday, September 17, 2024, 7:15:49 PM, you wrote:

BC> You should upgrade to 4.0.1. That error on that line indicates that you are 
running an obsolete 3.4.x version.

As far as that goes I'm just waiting to hear what the host of our VM says about 
updating it, as CentOS7 went EOL a couple of months ago.

No point is trying things until they've resolved that.

-- 
Best regards,
 Niamhmailto:ni...@fullbore.co.uk



Re: Tips on training bayes?

2024-09-19 Thread Bill Cole

On 2024-09-17 at 16:29:52 UTC-0400 (Tue, 17 Sep 2024 16:29:52 -0400)
Alex 
is rumored to have said:




It is up to the user, ie you, what is and what is not spam.



Well, yes, and no.

Of course it's my own system and I can define these terms however I 
wish.
I'm also familiar with the need to investigate every message - perhaps 
I

should have made that clear initially.

It's only these few types of messages that are very subjective and
experience from the broader open source community would be 
appreciated.


The debate over the specific definition of "spam" is an old and diverse 
conversation. It has damaged friendships and careers.



If it has a legitimate unsubscribe link, does that make it ham?


No.


What criteria do you use to determine "spamminess/haminess of EVERY
message"?


The Official Lumber Cartel acronym for spam is UBE:

Unsolicited: the sender has no sound reason to believe that the target 
requested this particular email (or narrowly defined class of email.)


Bulk: the sender appears to have sent substantially the same message to 
many different people without meaningful targeting. This can be inferred 
from generic content directed at the widest audience, e.g. commercial or 
political advertising.


Email: obvious.

Judging that requires some knowledge of the target. I can't tell you 
whether your borderline email is spam. Neither can SA, but Bayes is one 
way it tries to guess.


Is the goal to have every message one of either BAYES_00 or BAYES_99 
or is
it okay that newsletters (for example) are BAYES_50, and let other 
rules,

like network checks, determine the score?


The logical model of Naive Bayesian classification is for strictly 
binary classes. A message is either ham or spam. Identical messages can 
be ham in one mailbox and spam in another, so I suppose one could more 
accurately see the classification as being of the combined email and its 
envelope of metadata.


Bayesian classification does NOT provide a degree of "spamminess" in 
email, it provides a probability of mail being spam. That is a subtle 
but important distinction. A 50% Bayes score doesn't mean a message is 
semi-spam, it means Bayes cannot tell whether the message is spam. So 
yes, it is *OK* that Bayes can't tell whether a newsletter that has 
spam-like content but has an unsub link going to a usually-good ESP is 
spam or ham. A lot of email is that way: its insane HTML and/or 
hype-filled wording smells like spam but since the target wants it, it's 
ham.


This is a core design principle in SA: there's no perfect objective test 
for spam. That's why we have hundreds of scored rules and sub-rules and 
multiple shared reputation tests. A single test (such as Bayes) being 
wrong is not a flaw, it is an inescapable attribute of SA's design.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Tips on training bayes?

2024-09-18 Thread Greg Troxel
Alex  writes:

> It's only these few types of messages that are very subjective and
> experience from the broader open source community would be appreciated.
>
> If it has a legitimate unsubscribe link, does that make it ham?
>
> What criteria do you use to determine "spamminess/haminess of EVERY
> message"?

I think you're asking the wrong question.

A message is spam if it is bulk and unsolicited.  So it is ham if the
user asked for it (truly asked, not failed to uncheck a pre-checked
box), and it is spam if they did not.   Of course humans are not
reliable about remembering.

Having an unsubscribe link *absolutely* does not make it ham.

So you have to ask users to classify, or you are just guessing.


> Is the goal to have every message one of either BAYES_00 or BAYES_99 or is
> it okay that newsletters (for example) are BAYES_50, and let other rules,
> like network checks, determine the score?

In general the great to the edge something is the more useful the score,
but you can't actually push them all to 00/99.  There could be a
newsletter than user A asked for and is thus ham but user B did not and
when it arrives to them it is spam.

Thus, you tend to need per-user bayes.


But if bayes says 50, that's life and you roll with it.



Re: Tips on training bayes?

2024-09-18 Thread Benny Pedersen

Jared Hall via users skrev den 2024-09-18 20:08:


On Deb-based distros, you can add this in /etc/amavis/conf.d/50-user
under the $max_servers parameter.


also remember its safe to use tmpfs for tmp dir in amavisd

no joke


Re: Tips on training bayes?

2024-09-18 Thread Jared Hall via users

On 9/18/2024 10:19 AM, natan wrote:

Hi
I was very disappointed with spamassassin 4.x because it started to 
grow /var/lib/amavis/tmp/


With SA 3.4.X - on average 100MB and it deletes on the fly
With SA 4.X - on average 2-6GB and I had to do a quick fix:
59 23 * * * root find /var/lib/amavis/tmp/ -mtime +0 -delete;



The tmp folders that are created by each Amavis child process hang
around until $max_requests is hit.  The default is 20.

If you have SSD drives where R/W times are negligible, you *might*
want to set $max_requests = 1;  Now each child will clean-up it's
tmp folder after every message.

On Deb-based distros, you can add this in /etc/amavis/conf.d/50-user
under the $max_servers parameter.

-- Jared Hall








Re: Tips on training bayes?

2024-09-18 Thread Benny Pedersen

natan skrev den 2024-09-18 16:36:

W dniu 18.09.2024 o 16:30, Reindl Harald (privat) pisze:


who reply here ? :)

don't blame SA when a blind man can see that your problem is on the 
Amavis side - why do one need Amavis tu begin with when there is SA 
and spamass-milter
yes yes  everyone knows better why I use amavis ? Because each of 
my users has their own whitelist and blacklist and score rules


spamassassin can have all this info in sql/ldap, amavis can have access 
to this aswell, just not same as scoreing rule set, so if this is used 
in amavis it will be hard reject or accept pr forged results :/


Amavis bad... uga buga . but this bad amavis works fine with every 
SA version except 4.X

Of course, if I didn't need it, I wouldn't use it.


lol

try spampd then

I'll tell you more I have amavis and rspamd (NTG) for testing and this 
bad amavis also works correctly 


just use rspamd if unsure why it works, no logs no problem


Re: Tips on training bayes?

2024-09-18 Thread natan

W dniu 18.09.2024 o 16:30, Reindl Harald (privat) pisze:



Am 18.09.24 um 16:19 schrieb natan:

Hi
I was very disappointed with spamassassin 4.x because it started to 
grow /var/lib/amavis/tmp/


With SA 3.4.X - on average 100MB and it deletes on the fly
With SA 4.X - on average 2-6GB and I had to do a quick fix:
59 23 * * * root find /var/lib/amavis/tmp/ -mtime +0 -delete;


don't blame SA when a blind man can see that your problem is on the 
Amavis side - why do one need Amavis tu begin with when there is SA 
and spamass-milter
yes yes  everyone knows better why I use amavis ? Because each of my 
users has their own whitelist and blacklist and score rules


Amavis bad... uga buga . but this bad amavis works fine with every 
SA version except 4.X

Of course, if I didn't need it, I wouldn't use it.

I'll tell you more I have amavis and rspamd (NTG) for testing and this 
bad amavis also works correctly 



W dniu 18.09.2024 o 16:09, Matus UHLAR - fantomas pisze:

On 18.09.24 13:42, Grega via users wrote:
Right now in SA 4.0.1 bayes at least for me is really challenging 
to train and set up.


I had good trained DB from past V3 install, and it behaved really odd.

I trained it on new set of mails 3000 spam and 3000 ham (HAND 
PICKED mail it was PAIN) and I cant get either BAYES_00 or BAYES_99 :)


I mean I get them occasionally, but not even close to what it was 
in V3.



In V3 SA bayes was decisive, when well trained it was awesome.

Nov in V4.0.1 bayes is NON decisive, and in 90% of cases it gives 
me BAYES_40 or _50 even after I mark those mails as SPAM OR HAM.



What is even more weird is, that some mails aren`t even bayes 
scored at all. BAYES_XX is missing from headers entirely and I


don`t know why...


I`m kind of sorry that I upgraded to 4.0.1...



looking at your first mail, it seems that you only have tokens for a 
few days:


dbg: bayes: corpus size: nspam = 1190, nham = 12441 dbg: bayes: DB 
expiry: tokens in \
DB: 979401, Expiry max size: 150, Oldest atime: 1725361640, 
Newest atime: \

1725888528, Last expire: 0, Current time: 1725888537

% date -d @1725361640
Tue Sep  3 13:07:20 CEST 2024

% date -d @1725888528
Mon Sep  9 15:28:48 CEST 2024


How do you call spamassassin, directly, via spamass-milter, amavis 
or other way?

Did you tune any bayes settings?
Do you have your trusted_networks and internal_networks set up 
properly?


--



Re: Tips on training bayes?

2024-09-18 Thread Matus UHLAR - fantomas

On 18.09.24 16:19, natan wrote:
I was very disappointed with spamassassin 4.x because it started to 
grow /var/lib/amavis/tmp/


amavis should clean this itself.
which amavis version do you have installed?
did you tune it anyhow?

Did you enable and configure extracttext plugin?
Because that one may be kinda filing it up.


With SA 3.4.X - on average 100MB and it deletes on the fly
With SA 4.X - on average 2-6GB and I had to do a quick fix:
59 23 * * * root find /var/lib/amavis/tmp/ -mtime +0 -delete;

W dniu 18.09.2024 o 16:09, Matus UHLAR - fantomas pisze:

On 18.09.24 13:42, Grega via users wrote:
Right now in SA 4.0.1 bayes at least for me is really challenging 
to train and set up.


I had good trained DB from past V3 install, and it behaved really odd.

I trained it on new set of mails 3000 spam and 3000 ham (HAND 
PICKED mail it was PAIN) and I cant get either BAYES_00 or 
BAYES_99 :)


I mean I get them occasionally, but not even close to what it was in V3.


In V3 SA bayes was decisive, when well trained it was awesome.

Nov in V4.0.1 bayes is NON decisive, and in 90% of cases it gives 
me BAYES_40 or _50 even after I mark those mails as SPAM OR HAM.



What is even more weird is, that some mails aren`t even bayes 
scored at all. BAYES_XX is missing from headers entirely and I


don`t know why...


I`m kind of sorry that I upgraded to 4.0.1...



looking at your first mail, it seems that you only have tokens for a 
few days:


dbg: bayes: corpus size: nspam = 1190, nham = 12441 dbg: bayes: DB 
expiry: tokens in \
DB: 979401, Expiry max size: 150, Oldest atime: 1725361640, 
Newest atime: \

1725888528, Last expire: 0, Current time: 1725888537

% date -d @1725361640
Tue Sep  3 13:07:20 CEST 2024

% date -d @1725888528
Mon Sep  9 15:28:48 CEST 2024


How do you call spamassassin, directly, via spamass-milter, amavis 
or other way?

Did you tune any bayes settings?
Do you have your trusted_networks and internal_networks set up properly?


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
There's a long-standing bug relating to the x86 architecture that
allows you to install Windows.   -- Matthew D. Fuller


Re: Tips on training bayes?

2024-09-18 Thread natan

Hi
I was very disappointed with spamassassin 4.x because it started to grow 
/var/lib/amavis/tmp/


With SA 3.4.X - on average 100MB and it deletes on the fly
With SA 4.X - on average 2-6GB and I had to do a quick fix:
59 23 * * * root find /var/lib/amavis/tmp/ -mtime +0 -delete;

W dniu 18.09.2024 o 16:09, Matus UHLAR - fantomas pisze:

On 18.09.24 13:42, Grega via users wrote:
Right now in SA 4.0.1 bayes at least for me is really challenging to 
train and set up.


I had good trained DB from past V3 install, and it behaved really odd.

I trained it on new set of mails 3000 spam and 3000 ham (HAND PICKED 
mail it was PAIN) and I cant get either BAYES_00 or BAYES_99 :)


I mean I get them occasionally, but not even close to what it was in V3.


In V3 SA bayes was decisive, when well trained it was awesome.

Nov in V4.0.1 bayes is NON decisive, and in 90% of cases it gives me 
BAYES_40 or _50 even after I mark those mails as SPAM OR HAM.



What is even more weird is, that some mails aren`t even bayes scored 
at all. BAYES_XX is missing from headers entirely and I


don`t know why...


I`m kind of sorry that I upgraded to 4.0.1...



looking at your first mail, it seems that you only have tokens for a 
few days:


dbg: bayes: corpus size: nspam = 1190, nham = 12441 dbg: bayes: DB 
expiry: tokens in \
DB: 979401, Expiry max size: 150, Oldest atime: 1725361640, Newest 
atime: \

1725888528, Last expire: 0, Current time: 1725888537

% date -d @1725361640
Tue Sep  3 13:07:20 CEST 2024

% date -d @1725888528
Mon Sep  9 15:28:48 CEST 2024


How do you call spamassassin, directly, via spamass-milter, amavis or 
other way?

Did you tune any bayes settings?
Do you have your trusted_networks and internal_networks set up properly?



--



Re: Tips on training bayes?

2024-09-18 Thread Matus UHLAR - fantomas

On 18.09.24 13:42, Grega via users wrote:

Right now in SA 4.0.1 bayes at least for me is really challenging to train and 
set up.

I had good trained DB from past V3 install, and it behaved really odd.

I trained it on new set of mails 3000 spam and 3000 ham (HAND PICKED mail it 
was PAIN) and I cant get either BAYES_00 or BAYES_99 :)

I mean I get them occasionally, but not even close to what it was in V3.


In V3 SA bayes was decisive, when well trained it was awesome.

Nov in V4.0.1 bayes is NON decisive, and in 90% of cases it gives me BAYES_40 
or _50 even after I mark those mails as SPAM OR HAM.


What is even more weird is, that some mails aren`t even bayes scored at all. 
BAYES_XX is missing from headers entirely and I

don`t know why...


I`m kind of sorry that I upgraded to 4.0.1...



looking at your first mail, it seems that you only have tokens for a few 
days:


dbg: bayes: corpus size: nspam = 1190, nham = 12441 dbg: bayes: DB expiry: 
tokens in \
DB: 979401, Expiry max size: 150, Oldest atime: 1725361640, Newest atime: \
1725888528, Last expire: 0, Current time: 1725888537

% date -d @1725361640
Tue Sep  3 13:07:20 CEST 2024

% date -d @1725888528
Mon Sep  9 15:28:48 CEST 2024


How do you call spamassassin, directly, via spamass-milter, amavis or other 
way?

Did you tune any bayes settings?
Do you have your trusted_networks and internal_networks set up properly?

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Spam = (S)tupid (P)eople's (A)dvertising (M)ethod


Re: Tips on training bayes?

2024-09-18 Thread Grega via users
Right now in SA 4.0.1 bayes at least for me is really challenging to train and 
set up.

I had good trained DB from past V3 install, and it behaved really odd.

I trained it on new set of mails 3000 spam and 3000 ham (HAND PICKED mail it 
was PAIN) and I cant get either BAYES_00 or BAYES_99 :)

I mean I get them occasionally, but not even close to what it was in V3.


In V3 SA bayes was decisive, when well trained it was awesome.

Nov in V4.0.1 bayes is NON decisive, and in 90% of cases it gives me BAYES_40 
or _50 even after I mark those mails as SPAM OR HAM.


What is even more weird is, that some mails aren`t even bayes scored at all. 
BAYES_XX is missing from headers entirely and I

don`t know why...


I`m kind of sorry that I upgraded to 4.0.1...


Regards,G



From: Alex 
Sent: Tuesday, 17 September 2024 22:29
To: SA Mailing list
Subject: Re: Tips on training bayes?


It is up to the user, ie you, what is and what is not spam.

Well, yes, and no.

Of course it's my own system and I can define these terms however I wish. I'm 
also familiar with the need to investigate every message - perhaps I should 
have made that clear initially.

It's only these few types of messages that are very subjective and experience 
from the broader open source community would be appreciated.

If it has a legitimate unsubscribe link, does that make it ham?

What criteria do you use to determine "spamminess/haminess of EVERY message"?

Is the goal to have every message one of either BAYES_00 or BAYES_99 or is it 
okay that newsletters (for example) are BAYES_50, and let other rules, like 
network checks, determine the score?

Thanks,
Alex



RE: non-free Services

2024-09-18 Thread Simon Standley
We use invalument.com ... good for the stuff that often slips by. Your mileage 
may vary, etc.

-Original Message-
From: Philipp Ewald  
Sent: 18 September 2024 11:27
To: users@spamassassin.apache.org
Subject: Re: non-free Services

Hello,

>The idea is that you can use those services for free if you are a small 
>user (spam filter for me and my dog) but if you start to look like a 
>commercial service yourself, you need to pay your part.

Yes we use commercial. We allready paying SURBL because we got a information 
about limits.


Thanks i will check them and if we will hit limits.

> Some RBL are cheaper. Some are open to deals when you share data. Some 
> are worth every penny of their expensive plans. Some good RBL are not in 
> standard spamassassin

Some recommendations?

many thnaks

Am 18.09.24 um 12:18 schrieb Laurent S.:
> On 18.09.24 11:37, Philipp Ewald wrote:
>> Hello,,
>>
>> im searching for all non-free comercial services in Spamassasin.
>>
>>
>> ATM i found:
>> dns_query_restriction deny sorbs.net
>> dns_query_restriction deny bl.mailspike.net
>> dns_query_restriction deny wl.mailspike.net
>> Spamcop (ZEN)
>>
>> Does i need to disable other services as well?
>> cant find any official information.
>>
>>
>> We have high mail volume so "free" use is not possilbe.
>>
>>
>> Kind regards
>> Philipp
> Depending on the amount to traffic you have, dnswl.org, uribl.com,
> surbl.org will also start blocking your requests. If you monitor
> spamassassin hits for RCVD_IN_DNSWL_BLOCKED, URIBL_BLOCKED or
> SURBL_BLOCKED, you'll know you are above limits.
>
> The idea is that you can use those services for free if you are a small
> user (spam filter for me and my dog) but if you start to look like a
> commercial service yourself, you need to pay your part.
>
> Some RBL are cheaper. Some are open to deals when you share data. Some
> are worth every penny of their expensive plans. Some good RBL are not in
> standard spamassassin. I'd advise you to make sure you still have a
> decent amount of RBL still active.
>
> Best,
> Laurent
>



Good DNSBLs not in standard spamassassin (Was Re: non-free Services)

2024-09-18 Thread Andy Smith
Hi,

On Wed, Sep 18, 2024 at 10:18:18AM +, Laurent S. wrote:
> Some good RBL are not in standard spamassassin.

Out of interest, which DNSBLs do you use/recommend that are not in
standard spamassassin?

Thanks,
Andy


Re: non-free Services

2024-09-18 Thread Philipp Ewald

Hello,

The idea is that you can use those services for free if you are a small 
user (spam filter for me and my dog) but if you start to look like a 
commercial service yourself, you need to pay your part.


Yes we use commercial. We allready paying SURBL because we got a 
information about limits.



Thanks i will check them and if we will hit limits.

Some RBL are cheaper. Some are open to deals when you share data. Some 
are worth every penny of their expensive plans. Some good RBL are not in 
standard spamassassin


Some recommendations?

many thnaks

Am 18.09.24 um 12:18 schrieb Laurent S.:

On 18.09.24 11:37, Philipp Ewald wrote:

Hello,,

im searching for all non-free comercial services in Spamassasin.


ATM i found:
dns_query_restriction deny sorbs.net
dns_query_restriction deny bl.mailspike.net
dns_query_restriction deny wl.mailspike.net
Spamcop (ZEN)

Does i need to disable other services as well?
cant find any official information.


We have high mail volume so "free" use is not possilbe.


Kind regards
Philipp

Depending on the amount to traffic you have, dnswl.org, uribl.com,
surbl.org will also start blocking your requests. If you monitor
spamassassin hits for RCVD_IN_DNSWL_BLOCKED, URIBL_BLOCKED or
SURBL_BLOCKED, you'll know you are above limits.

The idea is that you can use those services for free if you are a small
user (spam filter for me and my dog) but if you start to look like a
commercial service yourself, you need to pay your part.

Some RBL are cheaper. Some are open to deals when you share data. Some
are worth every penny of their expensive plans. Some good RBL are not in
standard spamassassin. I'd advise you to make sure you still have a
decent amount of RBL still active.

Best,
Laurent





Re: non-free Services

2024-09-18 Thread Laurent S.
On 18.09.24 11:37, Philipp Ewald wrote:
> Hello,,
> 
> im searching for all non-free comercial services in Spamassasin.
> 
> 
> ATM i found:
> dns_query_restriction deny sorbs.net
> dns_query_restriction deny bl.mailspike.net
> dns_query_restriction deny wl.mailspike.net
> Spamcop (ZEN)
> 
> Does i need to disable other services as well?
> cant find any official information.
> 
> 
> We have high mail volume so "free" use is not possilbe.
> 
> 
> Kind regards
> Philipp

Depending on the amount to traffic you have, dnswl.org, uribl.com, 
surbl.org will also start blocking your requests. If you monitor 
spamassassin hits for RCVD_IN_DNSWL_BLOCKED, URIBL_BLOCKED or 
SURBL_BLOCKED, you'll know you are above limits.

The idea is that you can use those services for free if you are a small 
user (spam filter for me and my dog) but if you start to look like a 
commercial service yourself, you need to pay your part.

Some RBL are cheaper. Some are open to deals when you share data. Some 
are worth every penny of their expensive plans. Some good RBL are not in 
standard spamassassin. I'd advise you to make sure you still have a 
decent amount of RBL still active.

Best,
Laurent



Re: non-free Services

2024-09-18 Thread Philipp Ewald

OK, thank for that input.

Am 18.09.24 um 11:46 schrieb Marc:

im searching for all non-free comercial services in Spamassasin.


ATM i found:
dns_query_restriction deny sorbs.net
dns_query_restriction deny bl.mailspike.net
dns_query_restriction deny wl.mailspike.net
Spamcop (ZEN)

Does i need to disable other services as well?
cant find any official information.


We have high mail volume so "free" use is not possilbe.



I am also thinking of scaling the email business for quite some time, but did 
not do shit in practice. However I was wondering what the limits are you are 
reaching?

Eg. if you are borderline, maybe you could think about how to arrange your 
processing flow. I don't think it is efficient to use dnsbl in spamassassin. 
With rearranging your processing I mean before you even start doing dns lookups 
start rejecting emails.
The other thing I can think of is rearranging your dns servers and caching so 
you bring back the amount of queries you send there.






RE: non-free Services

2024-09-18 Thread Marc
> 
> im searching for all non-free comercial services in Spamassasin.
> 
> 
> ATM i found:
> dns_query_restriction deny sorbs.net
> dns_query_restriction deny bl.mailspike.net
> dns_query_restriction deny wl.mailspike.net
> Spamcop (ZEN)
> 
> Does i need to disable other services as well?
> cant find any official information.
> 
> 
> We have high mail volume so "free" use is not possilbe.
> 
> 

I am also thinking of scaling the email business for quite some time, but did 
not do shit in practice. However I was wondering what the limits are you are 
reaching?

Eg. if you are borderline, maybe you could think about how to arrange your 
processing flow. I don't think it is efficient to use dnsbl in spamassassin. 
With rearranging your processing I mean before you even start doing dns lookups 
start rejecting emails. 
The other thing I can think of is rearranging your dns servers and caching so 
you bring back the amount of queries you send there. 




Re: Use of uninitialized value $response[0]

2024-09-17 Thread Niamh Holding


Hello Bill,

Tuesday, September 17, 2024, 7:15:49 PM, you wrote:

BC> The likely root cause there is the lack of any reply from the Pyzor server, 
which is unlikely to be a per-user
BC> condition.

But another user logs this-

procmail: Match on "< 512000"
procmail: Locking "spamassassin.lock"
procmail: Executing "/usr/local/bin/spamassassin"
procmail: [14159] Tue Sep 17 17:34:48 2024
procmail: Unlocking "spamassassin.lock"
procmail: No match on "^X-Spam-Status: Yes"
procmail: No match on "^^rom[ ]"
procmail: Assigning 
"LASTFOLDER=Maildir/new/1726590882.14159_1.potassium.holtain.net"

-- 
Best regards,
 Niamhmailto:ni...@fullbore.co.uk



Re: Tips on training bayes?

2024-09-17 Thread Alex
>
>
> It is up to the user, ie you, what is and what is not spam.
>

Well, yes, and no.

Of course it's my own system and I can define these terms however I wish.
I'm also familiar with the need to investigate every message - perhaps I
should have made that clear initially.

It's only these few types of messages that are very subjective and
experience from the broader open source community would be appreciated.

If it has a legitimate unsubscribe link, does that make it ham?

What criteria do you use to determine "spamminess/haminess of EVERY
message"?

Is the goal to have every message one of either BAYES_00 or BAYES_99 or is
it okay that newsletters (for example) are BAYES_50, and let other rules,
like network checks, determine the score?

Thanks,
Alex


Re: Use of uninitialized value $response[0]

2024-09-17 Thread Bill Cole
On 2024-09-17 at 13:10:13 UTC-0400 (Tue, 17 Sep 2024 18:10:13 +0100)
Niamh Holding 
is rumored to have said:

> Hello
>
> I'm seeing the following logged by Procmail in one and only one mailbox and 
> as far as I can see there is no difference in the Procmail recipe calling 
> Spamassassin in all the mailboxes
>
>  Procmail: Match on "< 256000"
> procmail: Locking "spamassassin.lock"
> procmail: Executing "/usr/local/bin/spamassassin"
> Sep 17 18:08:24.727 [16350] warn: no response
> Sep 17 18:08:24.727 [16350] warn: Use of uninitialized value $response[0] in 
> pattern match (m//) at 
> /usr/local/share/perl5/Mail/SpamAssassin/Plugin/Pyzor.pm line 307.
> procmail: [16344] Tue Sep 17 18:08:25 2024
> procmail: Unlocking "spamassassin.lock"

You should upgrade to 4.0.1. That error on that line indicates that you are 
running an obsolete 3.4.x version.

The likely root cause there is the lack of any reply from the Pyzor server, 
which is unlikely to be a per-user
condition.

-- 
Bill Cole


Re: Tips on training bayes?

2024-09-17 Thread Benny Pedersen

Jared Hall via users skrev den 2024-09-17 08:15:

On 9/16/2024 8:48 PM, Alex wrote:

Hi,
Now that I'm using SA4, and my bayes database is quite old, I'd like 
to retrain it with new ham and spam. I hoped someone had some pointers 
on some of the gray area and what you consider to be spam and ham.


Are reliable newsletters, like those from, say, a trusted news source 
where the user opts into the subscription, considered ham? What if you 
can't quite tell whether it was opt-in but it doesn't hit any current 
negative rules like blocklists or sender reputation or DCC? I'm 
assuming it's best to leave it to later be reported as BAYES_50?


Are unsolicited emails always considered spam?

Thanks,
Alex


+1


or change to redis backend where sa4 have ttl limits, keeping it more 
fresh, if this miss on other backends, it would be possible to solve it 
for other backends imho :=)


OP please make a bugzilla if not using redis for bayes





Re: Tips on training bayes?

2024-09-16 Thread Jared Hall via users




On 9/16/2024 8:48 PM, Alex wrote:

Hi,
Now that I'm using SA4, and my bayes database is quite old, I'd like 
to retrain it with new ham and spam. I hoped someone had some pointers 
on some of the gray area and what you consider to be spam and ham.


Are reliable newsletters, like those from, say, a trusted news source 
where the user opts into the subscription, considered ham? What if you 
can't quite tell whether it was opt-in but it doesn't hit any current 
negative rules like blocklists or sender reputation or DCC? I'm 
assuming it's best to leave it to later be reported as BAYES_50?


Are unsolicited emails always considered spam?

Thanks,
Alex


+1

-- Jared Hall





Re: Bayes in V4 compared to V3

2024-09-13 Thread John Hardin

On Fri, 13 Sep 2024, Bill Cole wrote:


Please send any replies to the list only.


...or to Harald only.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  USMC Rules of Gunfighting #20: The faster you finish the fight,
  the less shot you will get.
---
 Today: the 459th anniversary of the muslim Ottoman defeat at Malta


Re: Bayes in V4 compared to V3

2024-09-13 Thread Benny Pedersen

Grega via users skrev den 2024-09-13 16:16:

Sorry guys if I replied to all, my intentions were not to spam :)


top posters :)

imho not impossible to request 3dr party list archives to make a 
password for users, never mind


eggs came before chickens :=)







Re: Bayes in V4 compared to V3

2024-09-13 Thread Grega via users
Sorry guys if I replied to all, my intentions were not to spam :)



From: Benny Pedersen 
Sent: Friday, 13 September 2024 15:13
To: users@spamassassin.apache.org
Subject: Re: Bayes in V4 compared to V3

Bill Cole skrev den 2024-09-13 15:03:

> Please send any replies to the list only.

unsubscribe listarchivers ?

and make archived on apache.org with bugzilla login

don't know if it will help or not, but chicken and egg



Noise Around This List (was Re: Bayes in V4 compared to V3)

2024-09-13 Thread Bill Cole

On 2024-09-13 at 09:13:58 UTC-0400 (Fri, 13 Sep 2024 15:13:58 +0200)
Benny Pedersen 
is rumored to have said:


Bill Cole skrev den 2024-09-13 15:03:


Please send any replies to the list only.


unsubscribe listarchivers ?

and make archived on apache.org with bugzilla login

don't know if it will help or not, but chicken and egg


ASF has a core principle that our projects are managed and supported 
transparently. Restricting the ability to read any users@*.a.o list 
would be a severe departure from that principle. Subscribers must be 
disruptive to the list  on a persistent basis to be banned, by a 
unanimous consensus of the PMC. It is a very high bar.


Note that we also don't exert prior control over who can submit to our 
Bugzilla.  We handle spam there on a whack-a-mole basis, which has 
proven adequate for many years.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Bayes in V4 compared to V3

2024-09-13 Thread Antony Stone
On Friday 13 September 2024 at 15:13:58, Benny Pedersen wrote:

> Bill Cole skrev den 2024-09-13 15:03:
> > Please send any replies to the list only.
> 
> unsubscribe listarchivers ?
> and make archived on apache.org with bugzilla login
> don't know if it will help or not, but chicken and egg

I don't think we want to do anything to make list archives less available to 
people with questions in the future.  They should be open, public and 
unencumbered with any sort of login or access control.


Antony.

-- 
Perfection in design is achieved not when there is nothing left to add, but 
rather when there is nothing left to take away.

 - Antoine de Saint-Exupery

   Please reply to the list;
 please *don't* CC me.


Re: Bayes in V4 compared to V3

2024-09-13 Thread Benny Pedersen

Bill Cole skrev den 2024-09-13 15:03:


Please send any replies to the list only.


unsubscribe listarchivers ?

and make archived on apache.org with bugzilla login

don't know if it will help or not, but chicken and egg



Re: Bayes in V4 compared to V3

2024-09-13 Thread Bill Cole
Please note that "Reindl Harald" is excluded from posting to the 
SpamAssassin Users mailing list as a consequence of past behavior. It is 
my understanding that they still follow the list via some public archive 
and reply off-list whenever they have an opportunity to be rude towards 
people with SpamAssassin difficulties.


Whether or not their advice is worth considering is obviously a personal 
judgment, but you should be aware that you are speaking with someone who 
has in the past worked to disrupt this list (and others.)


Please send any replies to the list only.

On 2024-09-13 at 05:00:17 UTC-0400 (Fri, 13 Sep 2024 09:00:17 +)
Grega 
is rumored to have said:


Do you have V3 or V4 SA?



From: Reindl Harald (privat) 
Sent: Friday, 13 September 2024 10:57
To: Grega; Bill Cole; Grega via users
Subject: Re: Bayes in V4 compared to V3

autolearn was always a blackbox

that below are the stats for the current month and that bayes is built
from 2014 until now and i rebuild it from scratch every month

the corpus of 178.138 messages is stored as single eml-files

a few errors with autolearn over the years can amplify and render your
bayes usesless over time with no way to do anything because you don't
have the corpus and don't know what was trained how

[root@mail-gw:~]$ bayes-stats.sh
0 135700SPAM
0  42438HAM
05116765TOKEN

total 514M
  24K -rw-r- 1 sa-milt sa-milt  24K 2024-09-12 14:11 bayes_seen
129M -rw-r- 1 sa-milt sa-milt 160M 2024-09-12 14:11 bayes_toks
386M -rw-r- 1 sa-milt sa-milt 386M 2024-09-12 14:10 wordlist.db

BAYES_00 4455   45.10 %
BAYES_05  3633.67 %
BAYES_20  4714.76 %
BAYES_40  4404.45 %
BAYES_50 2106   21.32 %
BAYES_60  1191.20 % 5.87 % (OF TOTAL BLOCKED)
BAYES_80  1081.09 % 5.33 % (OF TOTAL BLOCKED)
BAYES_95   810.82 % 4.00 % (OF TOTAL BLOCKED)
BAYES_99 1735   17.56 %85.72 % (OF TOTAL BLOCKED)
BAYES_9991572   15.91 %77.66 % (OF TOTAL BLOCKED)

DELIVERED   13865   88.15 %
DNSWL   14376   91.40 %
SPF 15203   96.66 %
SPF/DKIM WL  5705   36.27 %
SHORTCIRCUIT 5894   37.47 %

BLOCKED  2024   12.86 %
SPAMMY   2043   12.98 %   100.93 % (OF TOTAL BLOCKED)

Am 13.09.24 um 10:51 schrieb Grega:

This strategy worked really great in V3 and bayes was excellent even
with autotrain and ocasionally manual training.


Now it`s non decisive and useless at least for me.

We have around 5k-7k daily mails...




*From:* Reindl Harald (privat) 
*Sent:* Friday, 13 September 2024 10:22
*To:* Grega; Bill Cole; Grega via users
*Subject:* Re: Bayes in V4 compared to V3


Am 13.09.24 um 06:53 schrieb Grega via users:
And I`m reconfiguring autolearn to -4 for HAM and 12 for SPAM to 
really

auto-train on correct mails...


this is even more nonsense than autolearn itself

what you really want to train are wrong classified messages and 
decision

can only be made by an human

if you train wrong classified mails in both directions you amplify 
the

incorrect result

it happens that HAM MAILS have a score above 12 from time to time
because of blacklists and over-aggressive rules and when you then
atolearn the content as spam your bayes will result in what it is now



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: Bayes in V4 compared to V3

2024-09-13 Thread Grega via users
Do you have V3 or V4 SA?



From: Reindl Harald (privat) 
Sent: Friday, 13 September 2024 10:57
To: Grega; Bill Cole; Grega via users
Subject: Re: Bayes in V4 compared to V3

autolearn was always a blackbox

that below are the stats for the current month and that bayes is built
from 2014 until now and i rebuild it from scratch every month

the corpus of 178.138 messages is stored as single eml-files

a few errors with autolearn over the years can amplify and render your
bayes usesless over time with no way to do anything because you don't
have the corpus and don't know what was trained how

[root@mail-gw:~]$ bayes-stats.sh
0 135700SPAM
0  42438HAM
05116765TOKEN

total 514M
  24K -rw-r- 1 sa-milt sa-milt  24K 2024-09-12 14:11 bayes_seen
129M -rw-r- 1 sa-milt sa-milt 160M 2024-09-12 14:11 bayes_toks
386M -rw-r- 1 sa-milt sa-milt 386M 2024-09-12 14:10 wordlist.db

BAYES_00 4455   45.10 %
BAYES_05  3633.67 %
BAYES_20  4714.76 %
BAYES_40  4404.45 %
BAYES_50 2106   21.32 %
BAYES_60  1191.20 % 5.87 % (OF TOTAL BLOCKED)
BAYES_80  1081.09 % 5.33 % (OF TOTAL BLOCKED)
BAYES_95   810.82 % 4.00 % (OF TOTAL BLOCKED)
BAYES_99 1735   17.56 %85.72 % (OF TOTAL BLOCKED)
BAYES_9991572   15.91 %77.66 % (OF TOTAL BLOCKED)

DELIVERED   13865   88.15 %
DNSWL   14376   91.40 %
SPF 15203   96.66 %
SPF/DKIM WL  5705   36.27 %
SHORTCIRCUIT 5894   37.47 %

BLOCKED  2024   12.86 %
SPAMMY   2043   12.98 %   100.93 % (OF TOTAL BLOCKED)

Am 13.09.24 um 10:51 schrieb Grega:
> This strategy worked really great in V3 and bayes was excellent even
> with autotrain and ocasionally manual training.
>
>
> Now it`s non decisive and useless at least for me.
>
> We have around 5k-7k daily mails...
>
>
>
> 
> *From:* Reindl Harald (privat) 
> *Sent:* Friday, 13 September 2024 10:22
> *To:* Grega; Bill Cole; Grega via users
> *Subject:* Re: Bayes in V4 compared to V3
>
>
> Am 13.09.24 um 06:53 schrieb Grega via users:
>> And I`m reconfiguring autolearn to -4 for HAM and 12 for SPAM to really
>> auto-train on correct mails...
>
> this is even more nonsense than autolearn itself
>
> what you really want to train are wrong classified messages and decision
> can only be made by an human
>
> if you train wrong classified mails in both directions you amplify the
> incorrect result
>
> it happens that HAM MAILS have a score above 12 from time to time
> because of blacklists and over-aggressive rules and when you then
> atolearn the content as spam your bayes will result in what it is now


Re: Bayes in V4 compared to V3

2024-09-13 Thread Grega via users
This strategy worked really great in V3 and bayes was excellent even with 
autotrain and ocasionally manual training.


Now it`s non decisive and useless at least for me.

We have around 5k-7k daily mails...



From: Reindl Harald (privat) 
Sent: Friday, 13 September 2024 10:22
To: Grega; Bill Cole; Grega via users
Subject: Re: Bayes in V4 compared to V3



Am 13.09.24 um 06:53 schrieb Grega via users:
> And I`m reconfiguring autolearn to -4 for HAM and 12 for SPAM to really
> auto-train on correct mails...

this is even more nonsense than autolearn itself

what you really want to train are wrong classified messages and decision
can only be made by an human

if you train wrong classified mails in both directions you amplify the
incorrect result

it happens that HAM MAILS have a score above 12 from time to time
because of blacklists and over-aggressive rules and when you then
atolearn the content as spam your bayes will result in what it is now



Re: Bayes in V4 compared to V3

2024-09-12 Thread Grega via users
Hi.


I just filtered in last week and I have

BAYES_20

BAYES_40

BAYES_50

BAYES_80


So no BAYES_00, _05, _90,_95 etc...


All extreme values which are the only one useful to do real scoring and marking 
are missing.

Today I`m going to train bayes manually with around 4000 SPAM and 4000 HAM and 
will see what will happen.


And I`m reconfiguring autolearn to -4 for HAM and 12 for SPAM to really 
auto-train on correct mails...


You said: "There were substantial changes in the Bayes module between v3 and 
v4. "

This is all I needed really :)


So I will manually adjust BAYES scores and this should help me achieve desired 
results..


About BAYES missing...

I have NO load, server is almost idle

BAYES in MariaDB so performance should not be problem.

Shortcircuit is not enabled.


Regards,

Grega



From: Bill Cole 
Sent: Thursday, 12 September 2024 21:38
To: Grega via users
Subject: Re: Bayes in V4 compared to V3


On 2024-09-12 at 14:05:11 UTC-0400 (Thu, 12 Sep 2024 18:05:11 +)
Grega via users 
is rumored to have said:

Hi.

I have SA 4.0.1 configured it, all is good, except for bayes. It IS working, it 
IS learning but when it classifies mail it is really not so decisive as it was 
in V3.
I have:

dbg: bayes: corpus size: nspam = 1190, nham = 12441 dbg: bayes: DB expiry: 
tokens in DB: 979401, Expiry max size: 150, Oldest atime: 1725361640, 
Newest atime: 1725888528, Last expire: 0, Current time: 1725888537
So I have enough spam/ham and really enough tokens...
What I find weird is this:
BAYES_50 and BAYES_40 have like 10.000 hits EACH which is ALOT

BAYES_80 only 600
BAYES_95 even less: 341
BAYES_99: 284
BAYES_20 only 150
BAYES_60 only 87
I have no BAYES lower than 40 at all.

What's that BAYES_20 line then?

I am training and also use autolearn.
I have also transferred corpus trained on SA v3 where it worked correctly.
Is Spamassassin v4 really so much more conservative or am I doing something 
wrong here?

There were substantial changes in the Bayes module between v3 and v4. Training 
the exact same corpus in the exact same order into v3.4x and 4.0x will yield 
different scores, due to *bug fixes* and *improvements* in parsing headers. In 
principle this should make scoring more consistent and accurate, which may mean 
fewer extreme scores. In theory, better parsing should result in some common 
tokens being split differently, yielding more diversity in their metrics. We 
also updated 'stopword' lists for various languages, removing tokens that are 
so common that they cannot help scoring in principle.

So, no, you are not doing anything wrong. We may need to re-examine the default 
scores for the BAYES_* rules to adapt but that has no concrete plan behind it.

With that said, I looked at recent logs on one system running the SA 
development trunk (which has no added Bayes changes relative to 4.0.1) and got 
this distribution:

16444 BAYES_00
20 BAYES_05
22 BAYES_20
13 BAYES_40
64 BAYES_50
2 BAYES_60
6 BAYES_80
2 BAYES_95
139 BAYES_99
138 BAYES_999

That is a machine that excludes most blatant spam at the SMTP layer, without 
handing it to SA.


Also;
One more thing...
Some mails even dont have BAYES added in score list, confirmed on 2 installs

How many?

While you are initially training the Bayes DB and lack adequate ham and spam 
counts, you get no BAYES hits. Also, if you have any rules set to 
"shortcircuit" they can cause SA to stop checking before Bayes is done.

I *think* I've also seen Bayes skip on excess load, with too much lock 
contention on a file-based mechanism like Berkeley DB.


   b...@scconsult.com or billc...@apache.org
   (AKA @grumpybozo@toad.social and many *@billmail.scconsult.com addresses)
   Not Currently Available For Hire



Re: Bayes in V4 compared to V3

2024-09-12 Thread Bill Cole

On 2024-09-12 at 14:05:11 UTC-0400 (Thu, 12 Sep 2024 18:05:11 +)
Grega via users 
is rumored to have said:


Hi.

I have SA 4.0.1 configured it, all is good, except for bayes. It IS 
working, it IS learning but when it classifies mail it is really not 
so decisive as it was in V3.

I have:

dbg: bayes: corpus size: nspam = 1190, nham = 12441 dbg: bayes: DB 
expiry: tokens in DB: 979401, Expiry max size: 150, Oldest atime: 
1725361640, Newest atime: 1725888528, Last expire: 0, Current time: 
1725888537

So I have enough spam/ham and really enough tokens...
What I find weird is this:
BAYES_50 and BAYES_40 have like 10.000 hits EACH which is ALOT

BAYES_80 only 600
BAYES_95 even less: 341
BAYES_99: 284
BAYES_20 only 150
BAYES_60 only 87
I have no BAYES lower than 40 at all.


What's that BAYES_20 line then?


I am training and also use autolearn.
I have also transferred corpus trained on SA v3 where it worked 
correctly.
Is Spamassassin v4 really so much more conservative or am I doing 
something wrong here?


There were substantial changes in the Bayes module between v3 and v4. 
Training the exact same corpus in the exact same order into v3.4x and 
4.0x will yield different scores, due to *bug fixes* and *improvements* 
in parsing headers. In principle this should make scoring more 
consistent and accurate, which may mean fewer extreme scores. In theory, 
better parsing should result in some common tokens being split 
differently, yielding more diversity in their metrics. We also updated 
'stopword' lists for various languages, removing tokens that are so 
common that they cannot help scoring in principle.


So, no, you are not doing anything wrong. We may need to re-examine the 
default scores for the BAYES_* rules to adapt but that has no concrete 
plan behind it.


With that said, I looked at recent logs on one system running the SA 
development trunk (which has no added Bayes changes relative to 4.0.1) 
and got this distribution:


16444 BAYES_00
  20 BAYES_05
  22 BAYES_20
  13 BAYES_40
  64 BAYES_50
   2 BAYES_60
   6 BAYES_80
   2 BAYES_95
 139 BAYES_99
 138 BAYES_999

That is a machine that excludes most blatant spam at the SMTP layer, 
without handing it to SA.




Also;
One more thing...
Some mails even dont have BAYES added in score list, confirmed on 2 
installs


How many?

While you are initially training the Bayes DB and lack adequate ham and 
spam counts, you get no BAYES hits. Also, if you have any rules set to 
"shortcircuit" they can cause SA to stop checking before Bayes is done.


I *think* I've also seen Bayes skip on excess load, with too much lock 
contention on a file-based mechanism like Berkeley DB.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: M365 phish with USER_IN_DKIM_WHITELIST

2024-08-30 Thread Alex
>
>
> I'm hoping someone can help me understand how what appears to be an
> invoice
> scam was passed through legitimate MS servers and
> even USER_IN_DKIM_WHITELIST.
>
> USER_IN_DKIM_WHITELIST refers to an explicit (i.e site or user-specific)
> welcomelist, so this you did to yourself...
>
Thanks so much for catching this. I searched for microsoft in my own list
but must have missed that one.

Looks like a good time to prune the historical cruft from the whole
welcomelist.

It's a very scary, realistic phish, for sure.


Re: M365 phish with USER_IN_DKIM_WHITELIST

2024-08-30 Thread Bill Cole

On 2024-08-30 at 13:35:02 UTC-0400 (Fri, 30 Aug 2024 13:35:02 -0400)
Alex 
is rumored to have said:


Hi,
I'm hoping someone can help me understand how what appears to be an 
invoice

scam was passed through legitimate MS servers and
even USER_IN_DKIM_WHITELIST.


USER_IN_DKIM_WHITELIST refers to an explicit (i.e site or user-specific) 
welcomelist, so this you did to yourself...



From: Microsoft 


There you go. *You* welcomelisted microsoft.com.

And Microsoft signed and sealed that mail. They believe it is entirely 
legit. They are not actually a reliably trustworthy entity on that 
topic, in fact I'd say they are quite prominently lousy at it.



Date: Fri, 30 Aug 2024 15:50:53 +
Subject: Your Microsoft order on August 30, 2024
Message-ID: 
<1ccff35e-284a-4b08-bef9-737552452...@az.westus3.microsoft.com>

To: rebeccaflam...@rebeccaflaming.onmicrosoft.com

It also hit a few of my local test rules, including one that hits when 
MS
mail is sent to us with a different To domain, but it received a 
negative

score because of being on the default DKIM whitelist.


It is NOT on the default list. That would be a hit on the 
USER_IN_DEF_*LIST rules. The only MS domain in the default list is 
accountprotection.microsoft.com. The rest is garbage...



https://pastebin.com/fmjK9AfK


Microsoft signed it. You have a rule that says you trust Microsoft to 
sign only their own non-spam mail.


Everyone makes trust errors... It's a recurring trope of many lives and 
of history.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: QR phish missed

2024-08-19 Thread Alex
>
>
> dbg: extracttext: [3209409] (/usr/bin/zbarimg) finished: exit 1
> dbg: extracttext: [3209409] (/usr/bin/zbarimg) stderr output: execvp
> failed, errno = 2 (No such file or directory)
> warn: extracttext: error from /usr/bin/zbarimg, please verify
> configuration: execvp failed, errno = 2 (No such file or directory)
>
> but zbarimg is very much at that location:
> # ls -l /usr/bin/zbarimg
> -rwxr-xr-x 1 root root 28344 Apr 14 20:00 /usr/bin/zbarimg
>
> # QR-code decoder
> extracttext_externalzbar/usr/bin/zbarimg -q -D {}
> extracttext_use zbar.jpg .png .pdf .webp
> image/(?:jpeg|png) application/pdf
> add_header  all ExtractText-Uris _EXTRACTTEXTURIS_
>
> This is also a bit different than the example provided in ExtractText.pm.
> This happens when I run SA on an email as root:
>
> # spamassassin -t -D extracttext < buck-qr-code.eml
>
> # spamassassin --version
> SpamAssassin version 4.0.2-r1919157
>   running on Perl version 5.38.2
>

It seems I was missing Ghostscript and ImageMagick on this system:

# /usr/bin/zbarimg -q -D zbar-image.pdf
execvp failed, errno = 2 (No such file or directory)
Magick: "gs" "-q" "-dBATCH" "-dSAFER" "-dMaxBitmap=5000" "-dNOPAUSE"
"-sDEVICE=ppmraw" "-dTextAlphaBits=4" "-dGraphicsAlphaBits=4" "-r900x900"
"-sOutputFile=/tmp/gmgtZP0F" "--" "/t
mp/gmlfC5a5" "-c" "quit".
ERROR: Postscript delegate failed (zbar-image.pdf)

Afterwards installing, it extracts the URL, but takes 13s on a E5-2623 v4 @
2.60GHz fedora40 system to do it:

# time /usr/bin/zbarimg -q -D zbar-image.pdf
QR-Code:
https://lookerstudio.google.com/reporting/7c7d2b18-6e83-4c3c-a275-2eb2e105fa6a?bHN0cmFuZEBidWNra25pdmVzLmNvbQ==

real0m13.361s
user0m12.640s
sys 0m1.933s

So it looks like it kills it because it takes too long when running it
through SA?
dbg: extracttext: killed stale helper [3229331] (/usr/bin/zbarimg)
info: extracttext: [3229331] (/usr/bin/zbarimg) error: exit 15
dbg: extracttext: [3229331] (/usr/bin/zbarimg) stderr output: Magick:
quitting due to signal 51 (SIGTERM) "Terminated"...

Is there a way to control the timeout value? Or a more optimal method?


Re: QR phish missed

2024-08-19 Thread Alex
Hi,

> On Sat, Aug 17, 2024 at 12:14 PM  wrote:
>
>> On 8/16/24 2:03 PM, Alex wrote:
>> > The body was empty with a PDF attachment. It's too big for pastebin.
>> >
>> https://drive.google.com/file/d/1FzBgTKoBgRp7TWkqjWqSqqESYmCGH0G2/view?usp=sharing
>> <
>> https://drive.google.com/file/d/1FzBgTKoBgRp7TWkqjWqSqqESYmCGH0G2/view?usp=sharing
>> >
>> >
>> > Any success stories with setting up zbar for QR code spam would also be
>> appreciated :-)
>>
>> With this rule the QR-code is extracted correctly.
>>
>> extracttext_externalzbar/usr/local/bin/zbarimg -q -D {}
>> extracttext_use zbar.jpg .png .pdf .webp
>> image/(?:jpeg|png) application/pdf
>> add_header  all ExtractText-Uris _EXTRACTTEXTURIS_
>>
>
> Is it possible zbar is competing with pdftotext for which content it
> contains? Looks like it's either unable to identify the image or unable to
> extract the link, perhaps because pdftotext is processing it instead?
>
> X-Spam-ExtractText-Uris:
> X-Spam-ExtractText-Chars: 323
> X-Spam-ExtractText-Words: 35
> X-Spam-ExtractText-Tools: pdftotext
> X-Spam-ExtractText-Types: application/pdf
> X-Spam-ExtractText-Extensions: pdf
> X-Spam-ExtractText-Flags:
>

I must have missed this the first time

dbg: extracttext: [3209409] (/usr/bin/zbarimg) finished: exit 1
dbg: extracttext: [3209409] (/usr/bin/zbarimg) stderr output: execvp
failed, errno = 2 (No such file or directory)
warn: extracttext: error from /usr/bin/zbarimg, please verify
configuration: execvp failed, errno = 2 (No such file or directory)

but zbarimg is very much at that location:
# ls -l /usr/bin/zbarimg
-rwxr-xr-x 1 root root 28344 Apr 14 20:00 /usr/bin/zbarimg

# QR-code decoder
extracttext_externalzbar/usr/bin/zbarimg -q -D {}
extracttext_use zbar.jpg .png .pdf .webp
image/(?:jpeg|png) application/pdf
add_header  all ExtractText-Uris _EXTRACTTEXTURIS_

This is also a bit different than the example provided in ExtractText.pm.
This happens when I run SA on an email as root:

# spamassassin -t -D extracttext < buck-qr-code.eml

# spamassassin --version
SpamAssassin version 4.0.2-r1919157
  running on Perl version 5.38.2


Re: QR phish missed

2024-08-19 Thread Alex
Hi,

On Sat, Aug 17, 2024 at 12:14 PM  wrote:

> On 8/16/24 2:03 PM, Alex wrote:
> > The body was empty with a PDF attachment. It's too big for pastebin.
> >
> https://drive.google.com/file/d/1FzBgTKoBgRp7TWkqjWqSqqESYmCGH0G2/view?usp=sharing
> <
> https://drive.google.com/file/d/1FzBgTKoBgRp7TWkqjWqSqqESYmCGH0G2/view?usp=sharing
> >
> >
> > Any success stories with setting up zbar for QR code spam would also be
> appreciated :-)
>
> With this rule the QR-code is extracted correctly.
>
> extracttext_externalzbar/usr/local/bin/zbarimg -q -D {}
> extracttext_use zbar.jpg .png .pdf .webp
> image/(?:jpeg|png) application/pdf
> add_header  all ExtractText-Uris _EXTRACTTEXTURIS_
>

Is it possible zbar is competing with pdftotext for which content it
contains? Looks like it's either unable to identify the image or unable to
extract the link, perhaps because pdftotext is processing it instead?

X-Spam-ExtractText-Uris:
X-Spam-ExtractText-Chars: 323
X-Spam-ExtractText-Words: 35
X-Spam-ExtractText-Tools: pdftotext
X-Spam-ExtractText-Types: application/pdf
X-Spam-ExtractText-Extensions: pdf
X-Spam-ExtractText-Flags:

Here's my ExtractText.cf. I've verified all paths exist. Hopefully gmail
doesn't truncate the lines. It does hit EXTRACTTEXT.

extracttext_external  pdftotext  /usr/bin/pdftotext -nopgbrk -layout -enc
UTF-8 {} -
extracttext_use   pdftotext  .pdf application/pdf

# http://docx2txt.sourceforge.net
extracttext_external  docx2txt   /usr/local/bin/docx2txt.pl {} -
extracttext_use   docx2txt   .docx application/docx

extracttext_external  antiword   /usr/bin/antiword -t -w 0 -m UTF-8.txt {}
extracttext_use   antiword   .doc application/(?:vnd\.?)?ms-?word.*

extracttext_external  unrtf  /usr/bin/unrtf --nopict {}
extracttext_use   unrtf  .doc .rtf application/rtf text/rtf

extracttext_external  odt2txt/usr/bin/odt2txt --encoding=UTF-8 {}
extracttext_use   odt2txt.odt .ott application/.*?opendocument.*text
extracttext_use   odt2txt.sdw .stw application/(?:x-)?soffice
application/(?:x-)?starwriter

extracttext_external  tesseract  {OMP_THREAD_LIMIT=1} /usr/bin/tesseract -c
page_separator= {} -
extracttext_use   tesseract  .jpg .png .bmp .tif .tiff
image/(?:jpeg|png|x-ms-bmp|tiff)

# QR-code decoder
extracttext_externalzbar/usr/bin/zbarimg -q -D {}
extracttext_use zbar.jpg .png .pdf .webp
image/(?:jpeg|png) application/pdf
add_header  all ExtractText-Uris _EXTRACTTEXTURIS_

add_header   all  ExtractText-Flags _EXTRACTTEXTFLAGS_
header   PDF_NO_TEXT  X-ExtractText-Flags =~ /\bpdftotext_NoText\b/
describe PDF_NO_TEXT  PDF without text
scorePDF_NO_TEXT  0.001

header   DOC_NO_TEXT  X-ExtractText-Flags =~
/\b(?:antiword|openxml|unrtf|odt2txt)_NoText\b/
describe DOC_NO_TEXT  Document without text
scoreDOC_NO_TEXT  0.001

header   EXTRACTTEXT  exists:X-ExtractText-Flags
describe EXTRACTTEXT  Email processed by extracttext plugin
scoreEXTRACTTEXT  0.001


Re: QR phish missed

2024-08-17 Thread giovanni

On 8/16/24 2:03 PM, Alex wrote:

The body was empty with a PDF attachment. It's too big for pastebin.
https://drive.google.com/file/d/1FzBgTKoBgRp7TWkqjWqSqqESYmCGH0G2/view?usp=sharing 


Any success stories with setting up zbar for QR code spam would also be 
appreciated :-)


With this rule the QR-code is extracted correctly.

extracttext_externalzbar/usr/local/bin/zbarimg -q -D {}
extracttext_use zbar.jpg .png .pdf .webp image/(?:jpeg|png) 
application/pdf
add_header  all ExtractText-Uris _EXTRACTTEXTURIS_

 Cheers
  Giovanni


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: QR phish missed

2024-08-16 Thread Bill Cole

On 2024-08-16 at 08:03:05 UTC-0400 (Fri, 16 Aug 2024 08:03:05 -0400)
Alex 
is rumored to have said:


It says that SPF failed, but SPF_PASS was hit, presumably from our
connection to Microsoft, not their connection to the spammer client:


Correct. You can only check SPF on the first SMTP transaction guided by 
an MX record and recorded by a trusted server.


Received-SPF: Fail (protection.outlook.com: domain of toppersrvs.com 
does

not
 designate 35.230.39.135 as permitted sender) receiver=
protection.outlook.com;
 client-ip=35.230.39.135; helo=[127.0.0.1];

Received-SPF: Pass (mailfrom) identity=mailfrom; 
client-ip=52.100.167.207;

helo=nam12-mw2-obe.outbound.protection.outlook.com; envelope-from=
administra...@toppersrvs.com; receiver=buckknives.com

ARC also failed:
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=fail (sender 
ip is

 35.230.39.135) smtp.rcpttodomain=buckknives.com smtp.mailfrom=
toppersrvs.com;
 dmarc=none action=none header.from=toppersrvs.com; dkim=none (message 
not

 signed); arc=none (0)

Should I also somehow be checking these SPF failures?


You really can't with SA, because it is not generally safe to trust the 
Received headers written by systems you don't control or have some sort 
of explicit relaying arrangement with. Because the initial submission of 
messages CANNOT be subjected to SPF tests, you don't want to test 
transactions that are not following an MX record.




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com 
addresses)

Not Currently Available For Hire


Re: What is RP? many false negatives and dont respond to emails

2024-08-13 Thread Matus UHLAR - fantomas

On 13.08.24 15:18, Philipp Ewald wrote:

Thanks, it was on hold. I will upgrade it.


configuring (daily) rule updates could be enough.
Of course, upgrading SpamAssassin is better than not upgrading it.


On 13.08.24 13:17, Axb wrote:

On 8/13/24 11:37, Philipp Ewald wrote:

User getting Spams with Score -5 because of this...
other experiences? does they answer e-mails? mine got not in weeks


 RCVD_IN_RP_CERTIFIED=-3, RCVD_IN_RP_RNBL=1.31, RCVD_IN_RP_SAFE=-2]

many thanks



Are you using an ancient SA version?
Those rules were removed/changed in March 2021


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
"Two words: Windows survives." - Craig Mundie, Microsoft senior strategist
"So does syphillis. Good thing we have penicillin." - Matthew Alton


Re: What is RP? many false negatives and dont respond to emails

2024-08-13 Thread Philipp Ewald

Thanks, it was on hold. I will upgrade it.

On 13.08.24 13:17, Axb wrote:

On 8/13/24 11:37, Philipp Ewald wrote:

User getting Spams with Score -5 because of this...
other experiences? does they answer e-mails? mine got not in weeks


 RCVD_IN_RP_CERTIFIED=-3, RCVD_IN_RP_RNBL=1.31, RCVD_IN_RP_SAFE=-2]

many thanks



Are you using an ancient SA version?
Those rules were removed/changed in March 2021



--
Philipp Ewald
Administrator

DigiOnline GmbH, Probsteigasse 15 - 19, 50670 Köln
Fax: +49 221 6500-690, E-Mail: philipp.ew...@digionline.de

AG Köln HRB 27711, St.-Nr. 5215 5811 0640
Geschäftsführer: Werner Grafenhain

Informationen zum Datenschutz: www.digionline.de/ds


Re: What is RP? many false negatives and dont respond to emails

2024-08-13 Thread Axb

On 8/13/24 11:37, Philipp Ewald wrote:

User getting Spams with Score -5 because of this...
other experiences? does they answer e-mails? mine got not in weeks


     RCVD_IN_RP_CERTIFIED=-3, RCVD_IN_RP_RNBL=1.31, RCVD_IN_RP_SAFE=-2]

many thanks



Are you using an ancient SA version?
Those rules were removed/changed in March 2021



  1   2   3   4   5   6   7   8   9   10   >