Re: Spamhaus spurious positives - how does SpamAssassin check Spamhaus?

2022-05-07 Thread Paul Pace

On 2022-05-07 10:37, Matija Nalis wrote:

On Sat, May 07, 2022 at 09:35:31AM -0700, Paul Pace wrote:

On 2022-05-07 07:53, Benny Pedersen wrote:
> On 2022-05-07 16:42, Paul Pace wrote:
> >   *   10 URIBL_SBL Contains an URL's NS IP listed in the Spamhaus SBL
> >   *  blocklist
> >   *  [URIs: wikileaksdotorg]

The problem with this solution is I don't know which domain is going 
to be
next, plus I'm not so much looking for a solution to this specific 
result,

but rather I want to understand why there is a disparity between what
SpamAssassin is reporting and what the Spamhaus website is reporting.


If you do:

grep -r URIBL_SBL /var/lib/spamassassin/
you'll see it does this:

/var/lib/spamassassin/3.004006/updates_spamassassin_org/25_uribl.cf:uridnssub
  URIBL_SBLzen.spamhaus.org.   A   127.0.0.2
/var/lib/spamassassin/3.004006/updates_spamassassin_org/25_uribl.cf:body
   URIBL_SBLeval:check_uridnsbl('URIBL_SBL')
/var/lib/spamassassin/3.004006/updates_spamassassin_org/25_uribl.cf:describe
   URIBL_SBLContains an URL's NS IP listed in the Spamhaus
SBL blocklist

which means if it wanted to check (for example) 195.35.109.44 it would 
do
DNS A record lookup on "44.109.35.195.zen.spamhaus.org" (note reversed 
quads),
and check if the result is "127.0.0.2" (which happens to be true in 
this case

at the moment - but might not be some time later):

% host -t a 44.109.35.195.zen.spamhaus.org
44.109.35.195.zen.spamhaus.org has address 127.0.0.2

Same procedure can be used for others RBLs.

As to why web lookup returns different result, is might be because
DNS results was cached earlier (maybe by some previous spam message),
and/or because you did not look it up fast enough. Data on RBL
servers changes all the time, and there is usually delay between
their current database (which is likely what the web interface looks
up directly) and their published DNS records (which would lag behind
it).

Anyway if you do DNS check at the same time (or very close; I think
default TTL there is 60 seconds) as spamassasin does it, you should
get the same result. If you do it minutes or hours later, the results
might be different again (how often they change depend on the RBL in
question, as well as your luck).


Thank you, this is exactly what I was looking for. Using dig it looks 
like the TTL is 2100.


Re: Spamhaus spurious positives - how does SpamAssassin check Spamhaus?

2022-05-07 Thread Matija Nalis
On Sat, May 07, 2022 at 09:35:31AM -0700, Paul Pace wrote:
> On 2022-05-07 07:53, Benny Pedersen wrote:
> > On 2022-05-07 16:42, Paul Pace wrote:
> > >   *   10 URIBL_SBL Contains an URL's NS IP listed in the Spamhaus SBL
> > >   *  blocklist
> > >   *  [URIs: wikileaksdotorg]
> 
> The problem with this solution is I don't know which domain is going to be
> next, plus I'm not so much looking for a solution to this specific result,
> but rather I want to understand why there is a disparity between what
> SpamAssassin is reporting and what the Spamhaus website is reporting.

If you do:

grep -r URIBL_SBL /var/lib/spamassassin/
you'll see it does this:

/var/lib/spamassassin/3.004006/updates_spamassassin_org/25_uribl.cf:uridnssub   
URIBL_SBLzen.spamhaus.org.   A   127.0.0.2
/var/lib/spamassassin/3.004006/updates_spamassassin_org/25_uribl.cf:body
URIBL_SBLeval:check_uridnsbl('URIBL_SBL')
/var/lib/spamassassin/3.004006/updates_spamassassin_org/25_uribl.cf:describe
URIBL_SBLContains an URL's NS IP listed in the Spamhaus SBL 
blocklist

which means if it wanted to check (for example) 195.35.109.44 it would do
DNS A record lookup on "44.109.35.195.zen.spamhaus.org" (note reversed quads),
and check if the result is "127.0.0.2" (which happens to be true in this case
at the moment - but might not be some time later):

% host -t a 44.109.35.195.zen.spamhaus.org
44.109.35.195.zen.spamhaus.org has address 127.0.0.2

Same procedure can be used for others RBLs. 

As to why web lookup returns different result, is might be because
DNS results was cached earlier (maybe by some previous spam message),
and/or because you did not look it up fast enough. Data on RBL
servers changes all the time, and there is usually delay between
their current database (which is likely what the web interface looks
up directly) and their published DNS records (which would lag behind
it).

Anyway if you do DNS check at the same time (or very close; I think
default TTL there is 60 seconds) as spamassasin does it, you should
get the same result. If you do it minutes or hours later, the results
might be different again (how often they change depend on the RBL in
question, as well as your luck).

-- 
Opinions above are GNU-copylefted.


Re: SPF skipped for whitelisted relay domain

2022-05-07 Thread Alex
> >I'm trying to understand why some domains are not whitelisted even
> >though they pass SPF and are in my local welcomelist_auth entries. I'm
> >using policyd-spf with postfix, and it appears to be adding the
> >following header:
> >
> >X-Comment: SPF skipped for whitelisted relay domain -
> >client-ip=13.110.6.221; helo=smtp14-ph2-sp4.mta.salesforce.com;
> >envelope-from=re...@support.meridianlink.com; receiver=
>
> you seem to have domain listed in whitelist policyd-spf whitelist.
> salesforce.com probably?

I figured out where it's whitelisted, but still don't understand how it works.

It's somehow referencing the postscreen access list I'm using:

postscreen_access_list =
permit_mynetworks, cidr:$config_directory/postscreen_access.cidr

In that file are cidr entries like:
13.110.208.0/21 permit
13.110.216.0/22 permit
13.110.224.0/20 permit

This file is auto-generated from my postwhite script that gathers IPs
for the "too big to fail" providers like salesforce and google and
microsoft.

which match the client IP for salesforce:
client-ip=13.110.6.221; helo=smtp14-ph2-sp4.mta.salesforce.com

I was aware of this access list, but I wasn't aware that the policy
daemon was also using it as well as postscreen.

The problem now is that I don't know _how_ it's using it, and how to
prevent it from affecting my welcomelist_auth entries. I don't see any
reference in the code that would indicate it's somehow getting this
info from postscreen/postfix and using it when making these decisions.

The unmodified original messages also no longer pass SPF - shouldn't
they? It does still pass DKIM from the command-line, and therefore my
welcomelist_auth entry, but not when it's first received.

There was a reason I added this email to the welcomelist in the first
place. Perhaps a temporary solution would be to just remove the
postscreen access lists for now? Other ideas? Someone would like to
help me troubleshoot this? I'm thinking the fact that the IP is
whitelisted in postscreen is somehow being passed through the socket
to policyd-spf in a structure somewhere.

> >My welcomelist entry in SA for this specific email is as:
> >welcomelist_auth re...@support.meridianlink.com
>
> is this in spamassassin's local.cf ?

Yes

> >salesforce is also listed in their SPF record:
> >$ dig +short txt support.meridianlink.com
> >"v=spf1 include:spf.protection.outlook.com include:_spf.salesforce.com -all"
>
> SPF_PASS idicates that the SPF hit.
>
> however, posting full headers could help us a bit.

https://pastebin.com/TvTx6KzY

$ spamassassin --version
SpamAssassin version 4.0.0-r1889518
  running on Perl version 5.32.1


Re: Spamhaus spurious positives - how does SpamAssassin check Spamhaus?

2022-05-07 Thread Paul Pace

On 2022-05-07 07:53, Benny Pedersen wrote:

On 2022-05-07 16:42, Paul Pace wrote:

I have set up SpamAssassin with the following in
/etc/spamassassin/mycustomscores.cf:



*   10 URIBL_SBL Contains an URL's NS IP listed in the Spamhaus SBL
*  blocklist
*  [URIs: wikileaksdotorg]


add to /etc/spamassassin/mycustomskipuribl.cf:

skip_uribl_domains wikileaksdotorg


The problem with this solution is I don't know which domain is going to 
be next, plus I'm not so much looking for a solution to this specific 
result, but rather I want to understand why there is a disparity between 
what SpamAssassin is reporting and what the Spamhaus website is 
reporting.




or reduce spamhaus score


With this I will get more spam in my inbox, especially spam sent from 
compromised accounts which usually have lots of positive modifiers.


Re: Spamassassin with Galera as SQL-Backend?

2022-05-07 Thread deano-spamassassin
 

On 2022-05-06 6:56 am, Henrik K wrote: 

> On Fri, May 06, 2022 at 12:31:47PM +0200, giovanni@paclan.itwrote:
> On 5/6/22 11:08, Niels Kobschätzki wrote: Hi, I have a setup where the 
> spamassassin-servers have actually no access to the data of the mail-servers. 
> Now I was looking into having per user bayes-databases and saw that I can do 
> that with a SQL-database. I have already a small galera-cluster and I wonder 
> if spamassassin will work with it because of the limitations galera has. The 
> limitations are: * only innodb * unsupported explicit locking * a primary key 
> on all tables is necessary * no XA transactions * no reliance on 
> auto-increment Does anyone have experience with such a setup? Few things to 
> consider: bayes_expire has no primary key.

>From what I see, there's no reason why it shouldn't be.

CREATE TABLE bayes_expire (
 id int(11) NOT NULL default '0',
 runtime int(11) NOT NULL default '0',
 KEY bayes_expire_idx1 (id)
) ENGINE=InnoDB;

BayesStore/MySQL.pm has kind of a dumb insert which might insert things
multiple times

 my $sql = "INSERT INTO bayes_expire (id,runtime) VALUES (?,?)";

It should just be converted to UPSERT.

Of course this won't help until 4.0.0 is released..

> bayes_vars MySQL table has the id defined as "id int(11) NOT NULL 
> AUTO_INCREMENT".

Google implies Galera supports auto_increment just fine, it just does
something funny like incrementing them in 3 multiples or something.

It works fine with Galera - been running that for years. This is from my
ansible spamassassin role 

> 267 # http://blog.secaserver.com/2013/10/converting-data-work-galera-cluster/
> 268 - name: Set bayes_expire key to PRIMARY so galera replication works
> 269 lineinfile:
> 270 path: /usr/share/doc/spamassassin/sql/bayes_mysql.sql
> 271 regexp: 'KEY (bayes_expire_idx1.*)'
> 272 line: 'PRIMARY KEY 1'
> 273 backrefs: yes
> 274 state: present
> 275
> 276 # NOTE: As of 3.4.3 see UPGRADE file - says to add last_hit field to awl 
> table
> 277 # exactly as we're doing here
> 278 # lastupdate timestamp default CURRENT_TIMESTAMP ON UPDATE 
> CURRENT_TIMESTAMP
> 279 # http://www200.pair.com/mecham/spam/debian-spamassassin-sql.html
> 280 - name: Add last_hit field to bayes_seen and awl tables
> 281 lineinfile:
> 282 path: "/usr/share/doc/spamassassin/sql/{{ item.file }}"
> 283 insertbefore: '.*({{ item.before }}).*'
> 284 line: ' last_hit timestamp NOT NULL default CURRENT_TIMESTAMP ON UPDATE 
> CURRENT_TIMESTAMP,'
> 285 state: present
> 286 with_items:
> 287 - { before: 'id,msgid', file: bayes_mysql.sql }
> 288 - { before: 'username,email,signedby,ip', file: awl_mysql.sql }
> 289
> 290 # Only run the DB creation tasks on a single host
> 291 # This depends on mysql/galera repliction being active to propagate
> 292 # the DB across to the other nodes
> 293 - when: inventory_hostname == groups.testmail[0] or play_hosts | length 
> == 1
> 294 block:
> 295 #
> 296 - name: Create Spamassassin database
> 297 mysql_db:
> 298 name: spamassassin
> 299 state: present
> 300 register: spamassassin_database

Just setting the bayes_expire key to PRIMARY was all that was needed. 

Re: Spamhaus spurious positives - how does SpamAssassin check Spamhaus?

2022-05-07 Thread Benny Pedersen

On 2022-05-07 16:42, Paul Pace wrote:

I have set up SpamAssassin with the following in
/etc/spamassassin/mycustomscores.cf:



*   10 URIBL_SBL Contains an URL's NS IP listed in the Spamhaus SBL
*  blocklist
*  [URIs: wikileaksdotorg]


add to /etc/spamassassin/mycustomskipuribl.cf:

skip_uribl_domains wikileaksdotorg

or reduce spamhaus score


Spamhaus spurious positives - how does SpamAssassin check Spamhaus?

2022-05-07 Thread Paul Pace
I have set up SpamAssassin with the following in 
/etc/spamassassin/mycustomscores.cf:


score RCVD_IN_SBL   10.0
score RCVD_IN_XBL   10.0
score RCVD_IN_PBL   10.0
score RCVD_IN_SBL_CSS   10.0
score URIBL_SBL 10.0
score URIBL_CSS 10.0
score URIBL_CSS_A   10.0
score URIBL_SBL_A   10.0

I do not otherwise block using Spamhaus at the MTA or elsewhere.

I occasionally see false positives because of these scores and it is 
when a domain is in the body of a message. When I check the Spamhaus 
website[1], the domain is not there. Each time this has occurred, it has 
been for a website currently in the news and usually something to do 
with politics.


A few days ago I happened to be on my computer exactly when one of these 
false positives came in[2]. I immediately went and checked the Spamhaus 
site and the domain was not listed. I checked several times throughout 
the day and never saw the domain there.


So I am trying to figure out why there is a disparity between what 
SpamAssassin reports and the Spamhaus website reports, but I'm not clear 
how SpamAssassin checks Spamhaus, and since these are usually domains I 
rarely have in a message any place, I don't have a good feel for whether 
or not this is some regular problem.


If anyone can point me to how this check is performed, that would be 
very helpful.


Thank you,


Paul

[1] https://check.spamhaus.org/
[2] Scores:
*   10 URIBL_SBL_A Contains URL's A record listed in the Spamhaus SBL
*  blocklist
*  [URIs: wikileaksdotorg]
*   10 URIBL_SBL Contains an URL's NS IP listed in the Spamhaus SBL
*  blocklist
*  [URIs: wikileaksdotorg]


Re: IPv6 issue

2022-05-07 Thread Grant Taylor

On 5/7/22 1:55 AM, Ted Mittelstaedt wrote:

I used to greylist and it helped a lot.


I used to use grey listing too.  I've found no listing to be equally 
effective.


2FA killed that, however.  When someone logs into a website, bank, 
etc quite often they use an email address as the second factor - 
so for that to work the email has to be delivered instantaneously. 
Also most 2FA does not follow any kind of SMTP standard, the will 
attempt delivery once and not retry if it fails.


No listing tends to benefit from this in that the sender is able to use 
the next server in line as soon as the sender can establish the 
connection fractions of a second later.


Once 2FA became a big deal for the banks I got far too many user 
complaints on the greylisting to keep it.


I've not knowingly had any problems with no listing like I used to have 
with grey listing.




--
Grant. . . .
unix || die



smime.p7s
Description: S/MIME Cryptographic Signature


Re: IPv6 issue

2022-05-07 Thread Benny Pedersen

On 2022-05-07 09:55, Ted Mittelstaedt wrote:


Once 2FA became a big deal for the banks I got far too many user
complaints on the greylisting to keep it.


2fa should NOT be done on email

idiotic banks :)


Re: IPv6 issue

2022-05-07 Thread Benny Pedersen

On 2022-05-07 02:39, Greg Troxel wrote:

I agree with what Grant said.

Also, I wonder how much greylisting would help, and if you were already
doing that.  The data I posted is for a machine that already does
greylisting in general, with varying times depending on inclusion in
various RBLs and local data.

I find that delaying connections from unknown places even 2 minutes
helps a lot.


i use sqlgrey with 60 mins, but at same time only for recipients that 
can live with delays :=)


long delays helps rbls to know more spammers, with after greylist delays 
is know as more spam


i cant use postscreen for this policy, so yes in some needs both 
postscreen and sqlgrey is super


sqlgrey i have ip whitelist of known maillists to not delay, and 
recipient listed in postgresql where opt out, all the best defense is 
there







Re: IPv6 issue

2022-05-07 Thread Ted Mittelstaedt

I used to greylist and it helped a lot.

2FA killed that, however.  When someone logs into a website, bank, etc 
quite often they use an email address as the second factor - so for that
to work the email has to be delivered instantaneously.  Also most 2FA 
does not follow any kind of SMTP standard, the will attempt delivery 
once and not retry if it fails.


Once 2FA became a big deal for the banks I got far too many user 
complaints on the greylisting to keep it.


Ted

On 5/6/2022 5:39 PM, Greg Troxel wrote:


I agree with what Grant said.

Also, I wonder how much greylisting would help, and if you were already
doing that.  The data I posted is for a machine that already does
greylisting in general, with varying times depending on inclusion in
various RBLs and local data.

I find that delaying connections from unknown places even 2 minutes
helps a lot.