Re: Issues with Yahoo/AOL emails and RCVD_NUMERIC_HELO

2018-07-31 Thread Sebastian Arcus



On 29/07/18 19:21, RW wrote:

On Sun, 29 Jul 2018 19:00:56 +0100
Dominic Raferd wrote:


On Sun, 29 Jul 2018 at 18:33, RW  wrote:


On Sun, 29 Jul 2018 12:28:08 +0200
Antony Stone wrote:
  

On Sunday 29 July 2018 at 12:17:07, Sebastian Arcus wrote yet
another email that's guaranteed to fail DMARC with a reject when
posted through a mailing list, and consequently I didn't
receive:

​...
  


​Ditto, and I haven't received (and won't receive) any of his
subsequent postings either (opendmarc is - quite rightly - blocking
them). More strangely, I didn't receive this message (above) except
apparently when quoted in reply by RW.​ Note to OP: when posting to
mailing lists, use a domain that does not have DMARC with p=reject
(and preferably not p=quarantine either).


Actually it's worse than that, the main problem (the last I looked) is
that his DKIM signs some List-* headers which guarantees a DKIM fail
when he posts through a mailing list.


I had no idea that DKIM signing can be such a nightmare. I have disabled 
all DKIM for the time being until I can get my head around on how to 
configure it properly - if that is even possible. Thank you for pointing 
it out - I wasn't aware of the issue.


Re: Issues with Yahoo/AOL emails and RCVD_NUMERIC_HELO

2018-07-29 Thread Sebastian Arcus



On 29/07/18 19:00, Dominic Raferd wrote:



On Sun, 29 Jul 2018 at 18:33, RW <mailto:rwmailli...@googlemail.com>> wrote:


On Sun, 29 Jul 2018 12:28:08 +0200
Antony Stone wrote:

 > On Sunday 29 July 2018 at 12:17:07, Sebastian Arcus wrote yet another
 > email that's guaranteed to fail DMARC with a reject when posted
 > through a mailing list, and consequently I didn't receive:
​...


​Ditto, and I haven't received (and won't receive) any of his subsequent 
postings either (opendmarc is - quite rightly - blocking them). More 
strangely, I didn't receive this message (above) except apparently when 
quoted in reply by RW.​ Note to OP: when posting to mailing lists, use a 
domain that does not have DMARC with p=reject (and preferably not 
p=quarantine either).


Thank you for highlighting this - I wasn't aware of the problem. I had 
no idea that enabling DMARC fixes one set of problems while creating a 
whole different one! I've disabled DMARC for the time, until I find a 
workable solution.


Re: Issues with Yahoo/AOL emails and RCVD_NUMERIC_HELO

2018-07-29 Thread Sebastian Arcus

On 29/07/18 14:36, Matus UHLAR - fantomas wrote:

On Sunday 29 July 2018 at 12:17:07, Sebastian Arcus wrote:

I've been having a number of emails recently from Yahoo and AOL senders
hitting the RCVD_NUMERIC_HELO rule. I'm trying to understand what is
going on:

1. First off, the rule hits on the EHLO line - which means the it is an
authenticated SMTP submission.



On 29/07/18 11:28, Antony Stone wrote:

Er, what?

No, EHLO simply means "Hello, I'm capable of doing ESMTP".


On 29.07.18 12:29, Sebastian Arcus wrote:
Looking again at it - the 82.132.242.82 is registered as O2/Telefonica 
wireless broadband. I wonder if this is a 3G/4G connection - which in 
UK always has a private IP address - at the mobile phone level. Maybe 
that's why the confusion - the MUA on the mobile phone thinks it is 
10.7.54.227 (which it is), but the Yahoo server can only see the 
public IP 80.132.242.82, which belongs to the O2 gateway. Could that 
explain that particular header?


it does.
Received: from 82.132.242.82 (EHLO [10.7.54.227]) ([82.132.242.82])
  by smtp409.mail.ir2.yahoo.com (Oath Hermes SMTP Server) with 
ESMTPA ID 84be422cfd662692400891131b957bd8

  for ;
  Mon, 23 Jul 2018 13:59:41 + (UTC)

Looking at /usr/share/perl5/Mail/SpamAssassin/Plugin/RelayEval.pm
I guess it should not match:

  my $rcvd = $pms->{relays_untrusted_str};

  if ($rcvd) {
    my $IP_ADDRESS = IPV4_ADDRESS;
    my $IP_PRIVATE = IP_PRIVATE;
    local $1;
    if ($rcvd =~ /\bhelo=($IP_ADDRESS)(?=[\000-\040,;\[()<>]|\z)/i  # 
Bug 5878

    && $1 !~ /$IP_PRIVATE/) {
  return 1;
    }

but maybe I read wrong. Which SA version do you have?


I have:

# spamassassin --version
SpamAssassin version 4.0.0-r1823176
  running on Perl version 5.26.2



Re: Issues with Yahoo/AOL emails and RCVD_NUMERIC_HELO

2018-07-29 Thread Sebastian Arcus



On 29/07/18 11:28, Antony Stone wrote:

On Sunday 29 July 2018 at 12:17:07, Sebastian Arcus wrote:


I've been having a number of emails recently from Yahoo and AOL senders
hitting the RCVD_NUMERIC_HELO rule. I'm trying to understand what is
going on:

1. First off, the rule hits on the EHLO line - which means the it is an
authenticated SMTP submission.


Er, what?

No, EHLO simply means "Hello, I'm capable of doing ESMTP".


Thank you - I clearly got that one wrong.

Looking again at it - the 82.132.242.82 is registered as O2/Telefonica 
wireless broadband. I wonder if this is a 3G/4G connection - which in UK 
always has a private IP address - at the mobile phone level. Maybe 
that's why the confusion - the MUA on the mobile phone thinks it is 
10.7.54.227 (which it is), but the Yahoo server can only see the public 
IP 80.132.242.82, which belongs to the O2 gateway. Could that explain 
that particular header?




>> After all, if it is EHLO, it probably is an MUA,
>
> No; MTAs also speak E/SMTP to each other, and some of those Received 
headers
> indicating handover of the mail from one server to another will 
contain the

> HELO or EHLO greetings.
>
>> 2. Or maybe this is caused by Yahoo's end - in which case would some
>> sort of exception be a good idea?
>
> Yes, I would do that.
>
>> Or maybe I am misunderstanding completely what is going on? I've
>> uploaded a set of headers here: https://pastebin.com/KDV1f0wW
>
> Given that the example you've posted is from a machine with a public IP
> 82.132.242.82, but thinks it has a private IP 10.7.54.227, I'm not 
entirely

> surprised there is no rDNS set up for the private address.


Issues with Yahoo/AOL emails and RCVD_NUMERIC_HELO

2018-07-29 Thread Sebastian Arcus
I've been having a number of emails recently from Yahoo and AOL senders 
hitting the RCVD_NUMERIC_HELO rule. I'm trying to understand what is 
going on:


1. First off, the rule hits on the EHLO line - which means the it is an 
authenticated SMTP submission. Is the correct HELO format important when 
the client actually does authenticated SMTP? After all, if it is EHLO, 
it probably is an MUA, which can't be expected to have proper DNS etc.


2. Or maybe this is caused by Yahoo's end - in which case would some 
sort of exception be a good idea?


Or maybe I am misunderstanding completely what is going on? I've 
uploaded a set of headers here: https://pastebin.com/KDV1f0wW


Thank you for any useful hints.


Re: SPF_HELO_FAIL triggers on domain with valid SPF record and HELO settings

2018-06-11 Thread Sebastian Arcus



On 11/06/18 08:56, Sebastian Arcus wrote:
I am running SA 4.0.0-r1823176 on Perl 5.26.2. On a number of domains I 
administer, outbound mail triggers the SPF_HELO_FAIL rule - but the 
regular SPF check passes. I am struggling to see why this is happening, 
as the HELO name is set to the same value as the name of the server/dns 
name, it has rDNS - and it clearly passes during the regular SPF check - 
but not the SPF_HELO check. I have re-checked the domain settings at 
mxtoolbox.com - and there doesn't seem to be any problem. Any ideas please?


It turns out that it is indeed something I did. Somehow in all this time 
since I started to use SPF, I never realised that SPF checks are also 
done on the HELO hostname itself, not only the sending domain - and the 
need to have a separate SPF record for it.


I actually had a separate SPF record for mail.sinclair-accounting.co.uk, 
in which I denied everything - as my understanding was that there will 
never be an address of the type u...@mail.sinclair-accounting.co.uk - so 
I wouldn't need to allow anything on SPF.


All corrected now - thank you for the input.


Re: SPF_HELO_FAIL triggers on domain with valid SPF record and HELO settings

2018-06-11 Thread Sebastian Arcus



On 11/06/18 10:20, Reindl Harald wrote:



Am 11.06.2018 um 10:57 schrieb Sebastian Arcus:


On 11/06/18 09:39, Matus UHLAR - fantomas wrote:

On 11.06.18 08:56, Sebastian Arcus wrote:

I am running SA 4.0.0-r1823176 on Perl 5.26.2. On a number of domains
I administer, outbound mail triggers the SPF_HELO_FAIL rule - but the
regular SPF check passes. I am struggling to see why this is
happening, as the HELO name is set to the same value as the name of
the server/dns name, it has rDNS - and it clearly passes during the
regular SPF check - but not the SPF_HELO check. I have re-checked the
domain settings at mxtoolbox.com - and there doesn't seem to be any
problem. Any ideas please?


do users use SMTP authentication?


Messages submitted over SMTP are authenticated. Other messages are
generated locally on the sending server and passed on the command line
to Exim. All messages hit SPF_HELO_FAIL


Is that visible in headers?


I'm not really sure. Which bit of the headers should contain the
authentication data?


look if exim has a similar feature
http://www.postfix.org/postconf.5.html#smtpd_sasl_authenticated_header



My question is, is this header a requirement? Both servers at both ends 
are configured by me, so I know the smtp submission is authenticated. Is 
the SPF check at the receiving end supposed to fail if it can't find a 
specific header showing the authenticated user at the sending end? What 
is the connection between SPF HELO checks at the receiving server, and 
the user which is submitting the message to the sending server? I'm not 
really following I'm afraid - but I could be missing the point.


Re: SPF_HELO_FAIL triggers on domain with valid SPF record and HELO settings

2018-06-11 Thread Sebastian Arcus



On 11/06/18 09:39, Matus UHLAR - fantomas wrote:

On 11.06.18 08:56, Sebastian Arcus wrote:
I am running SA 4.0.0-r1823176 on Perl 5.26.2. On a number of domains 
I administer, outbound mail triggers the SPF_HELO_FAIL rule - but the 
regular SPF check passes. I am struggling to see why this is 
happening, as the HELO name is set to the same value as the name of 
the server/dns name, it has rDNS - and it clearly passes during the 
regular SPF check - but not the SPF_HELO check. I have re-checked the 
domain settings at mxtoolbox.com - and there doesn't seem to be any 
problem. Any ideas please?


do users use SMTP authentication?


Messages submitted over SMTP are authenticated. Other messages are 
generated locally on the sending server and passed on the command line 
to Exim. All messages hit SPF_HELO_FAIL



Is that visible in headers?


I'm not really sure. Which bit of the headers should contain the 
authentication data?





# spamassassin -D 2>&1 < /test.eml | grep -i spf


we need to see the Received: header.


Sure:

Received: from mail.sinclair-accounting.co.uk ([80.229.84.190]:47700)
by mail.open-t.co.uk with esmtps 
(TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256)
(Exim 4.90)
(envelope-from )
id 1fSIEL-0001Wn-P4
for email_removed; Mon, 11 Jun 2018 09:31:16 +0100





Received: from jucara ([192.168.71.82])
	by mail.sinclair-accounting.co.uk with esmtpsa 
(TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128)

(Exim 4.90_1)
(envelope-from )
id 1fSIEG-0007bx-Lw
for email_removed; Mon, 11 Jun 2018 09:31:10 +0100


SPF_HELO_FAIL triggers on domain with valid SPF record and HELO settings

2018-06-11 Thread Sebastian Arcus
I am running SA 4.0.0-r1823176 on Perl 5.26.2. On a number of domains I 
administer, outbound mail triggers the SPF_HELO_FAIL rule - but the 
regular SPF check passes. I am struggling to see why this is happening, 
as the HELO name is set to the same value as the name of the server/dns 
name, it has rDNS - and it clearly passes during the regular SPF check - 
but not the SPF_HELO check. I have re-checked the domain settings at 
mxtoolbox.com - and there doesn't seem to be any problem. Any ideas please?


# spamassassin -D 2>&1 < /test.eml | grep -i spf



Jun 11 08:46:30.177 [5534] dbg: spf: checking to see if the message has 
a Received-SPF header that we can use

Jun 11 08:46:30.341 [5534] dbg: spf: using Mail::SPF for SPF checks
Jun 11 08:46:30.342 [5534] dbg: spf: found Envelope-From in first 
external Received header
Jun 11 08:46:30.342 [5534] dbg: spf: checking EnvelopeFrom 
(helo=mail.sinclair-accounting.co.uk, ip=80.229.84.190, 
envfrom=)
Jun 11 08:46:30.519 [5534] dbg: spf: query for 
/80.229.84.190/mail.sinclair-accounting.co.uk: result: 
pass, comment: , text: Mechanism 'mx' matched
Jun 11 08:46:30.758 [5534] dbg: spf: already checked for Received-SPF 
headers, proceeding with DNS based checks
Jun 11 08:46:30.758 [5534] dbg: spf: checking HELO 
(helo=mail.sinclair-accounting.co.uk, ip=80.229.84.190)
Jun 11 08:46:30.776 [5534] dbg: spf: query for 
/80.229.84.190/mail.sinclair-accounting.co.uk: result: 
fail, comment: Please see 
http://www.openspf.org/Why?s=helo;id=mail.sinclair-accounting.co.uk;ip=80.229.84.190;r=obelisk.open-t.lan, 
text: Mechanism '-all' matched
Jun 11 08:46:30.836 [5534] dbg: spf: def_whitelist_from_spf: 
ser...@sinclair-accounting.co.uk is not in DEF_WHITELIST_FROM_SPF
Jun 11 08:46:30.846 [5534] dbg: rules: ran eval rule SPF_PASS ==> 
got hit (1)
Jun 11 08:46:30.853 [5534] dbg: rules: ran eval rule SPF_HELO_FAIL 
==> got hit (1)


Re: FP with URI_TRY_3LD on get.adobe.com

2018-04-29 Thread Sebastian Arcus


On 27/04/18 16:22, John Hardin wrote:

On Fri, 27 Apr 2018, Sebastian Arcus wrote:



On 27/04/18 10:49, Sebastian Arcus wrote:
I am getting some FP's with URI_TRY_3LD hitting the url get.adobe.com 
in the body of emails:


Apr 27 10:45:39.330 [32173] dbg: rules: ran uri rule URI_TRY_3LD 
==> got hit: "http://get.adobe.com;


Would it be possible to add some exception to this rule - as many 
legitimate emails containing invoice attachments in pdf include the 
above url in the body.


It also appears to not like some DHL url's for some reason:

Apr 27 11:02:05.148 [32339] dbg: rules: ran uri rule URI_TRY_3LD 
==> got hit: "https://mybill.dhl.com;


my{mumble}.mumble.com is targeted. I'll think about that one; the rule 
isn't scored highly and I could see that helping out to detect DHL 
phishing.


If it is detecting DHL phishing is good - but if it is triggering on 
both legitimate DHL emails and phishing emails, I'm not sure it is that 
useful?


Re: FP with URI_TRY_3LD on get.adobe.com

2018-04-29 Thread Sebastian Arcus


On 27/04/18 16:19, John Hardin wrote:

On Fri, 27 Apr 2018, Sebastian Arcus wrote:

I am getting some FP's with URI_TRY_3LD hitting the url get.adobe.com 
in the body of emails:


Apr 27 10:45:39.330 [32173] dbg: rules: ran uri rule URI_TRY_3LD 
==> got hit: "http://get.adobe.com;


Would it be possible to add some exception to this rule - as many 
legitimate emails containing invoice attachments in pdf include the 
above url in the body.


Fixed.


Thank you


Re: FP with URI_TRY_3LD on get.adobe.com

2018-04-27 Thread Sebastian Arcus


On 27/04/18 10:49, Sebastian Arcus wrote:
I am getting some FP's with URI_TRY_3LD hitting the url get.adobe.com in 
the body of emails:


Apr 27 10:45:39.330 [32173] dbg: rules: ran uri rule URI_TRY_3LD ==> 
got hit: "http://get.adobe.com;


Would it be possible to add some exception to this rule - as many 
legitimate emails containing invoice attachments in pdf include the 
above url in the body.


It also appears to not like some DHL url's for some reason:

Apr 27 11:02:05.148 [32339] dbg: rules: ran uri rule URI_TRY_3LD ==> 
got hit: "https://mybill.dhl.com;


FP with URI_TRY_3LD on get.adobe.com

2018-04-27 Thread Sebastian Arcus
I am getting some FP's with URI_TRY_3LD hitting the url get.adobe.com in 
the body of emails:


Apr 27 10:45:39.330 [32173] dbg: rules: ran uri rule URI_TRY_3LD ==> 
got hit: "http://get.adobe.com;


Would it be possible to add some exception to this rule - as many 
legitimate emails containing invoice attachments in pdf include the 
above url in the body.


Re: URI_TRY_3LD fp's with QuickBooks Intuit emails

2018-04-13 Thread Sebastian Arcus


On 13/04/18 16:39, John Hardin wrote:

On Fri, 13 Apr 2018, John Hardin wrote:


On Fri, 13 Apr 2018, John Hardin wrote:


On Fri, 13 Apr 2018, Giovanni Bechis wrote:


On 04/13/18 09:06, Sebastian Arcus wrote:


But when it hits, it still adds 2.0 to the score (and I haven't 
customized the score anywhere else). Is this a special form of SA 
syntax?


The score in the current update is 0.001 across the board. Are you 
up-to-date and are you *sure* you don't have any overrides anywhere?


   72_scores.cf:score URI_TRY_3LD    0.001 0.001 0.001 0.001


OK - after more digging it surfaced that the original report with 2.0 
score is from a different server than the one I am testing on. That 
server has 2.0 scores in 4.00/updates_spamassassin_org/72_active.cf


When trying to run sa-update on that server, I am getting errors, so it 
must be that SA stopped updating a while ago there. I will dig in and 
find out why. Thank you for flagging the fact that the default score on 
the current configs is not supposed to be 2.0!


Re: URI_TRY_3LD fp's with QuickBooks Intuit emails

2018-04-13 Thread Sebastian Arcus


On 13/04/18 11:36, Giovanni Bechis wrote:

On 04/13/18 09:06, Sebastian Arcus wrote:

Hello all. I am getting some fp's with emails from QuickBooks / Intuit with the 
above rule:

Apr 13 08:00:30.853 [5768] dbg: rules: ran uri rule URI_TRY_3LD ==> got hit: 
"https://myturbotax.intuit.com;

On a slightly different note, and mainly for my curiosity to understand SA 
rules syntax, in 72_active.cf, the score seems to be commented out:

#score   URI_TRY_3LD   2.000   # limit

But when it hits, it still adds 2.0 to the score (and I haven't customized the 
score anywhere else). Is this a special form of SA syntax?


the score is present in rulesrc/sandbox/jhardin/20_misc_testing.cf with tflags 
publish.


Is that a location on the SA server - or am I suppose to have those dirs 
locally here? I can't seem to find them anywhere locally.


URI_TRY_3LD fp's with QuickBooks Intuit emails

2018-04-13 Thread Sebastian Arcus
Hello all. I am getting some fp's with emails from QuickBooks / Intuit 
with the above rule:


Apr 13 08:00:30.853 [5768] dbg: rules: ran uri rule URI_TRY_3LD ==> 
got hit: "https://myturbotax.intuit.com;


On a slightly different note, and mainly for my curiosity to understand 
SA rules syntax, in 72_active.cf, the score seems to be commented out:


#score   URI_TRY_3LD   2.000   # limit

But when it hits, it still adds 2.0 to the score (and I haven't 
customized the score anywhere else). Is this a special form of SA syntax?


Thank you for any answers


[OT] Re: Check for valid MX of sender and rspamd testing

2018-04-10 Thread Sebastian Arcus


On 10/04/18 08:41, Daniele Duca wrote:

On 09/04/2018 20:40, Sebastian Arcus wrote:



This might not really answer your question, but I've had really good 
results leaving all this to the MTA (Exim in my case). I actually go 
for the whole hog full callout verification - checking with the MX 
that the sender really exists. I know that some people are against 
this and say that you get blacklisted - but I've been doing this for 
about 8 months on 4 sites and it has worked very well. I have a local 
full callout verification whitelist - to skip callout verification 
mainly for Microsoft operated domains - which will blacklist you at 
the drop of the hat.

Hello Sebastian,

I'm curious about this approach. I never tried it, but, assuming that 
you check the MX of the envelope from domain, how do you deal with 
poorly-configured-but-legit VPS that use, in example, 
www-d...@hostname.of.the.server ? I have live examples of wordpress and 
vbulletin installations that have not existent envelope from mailboxes 
or VPS hostnames without MX records. There are also other services that 
actively send email in the form of "nore...@domain.com". If I understood 
correctly, your approach would heavily penalize these senders.


I know that in the ideal world everyone should configure their systems 
neatly, but unfortunately we are far from ideal conditions in real life :/


I'm happy to discuss this technique but I can't really afforhttps://www.exim.org/exim-html-current/doc/html/spec_html/ch-access_control_lists.htmld the 
administrative overhead I would have with users complaining about 
rejected emails..


Hi Daniele. I agree that configuring a real life system is often a 
balancing act between having a standards compliant and efficient system 
on one side - but at the same time compromising so that the users are 
not too inconvenienced. I started with a configuration which was as 
strict as I preferred, and then gradually loosened things up.


I also think that there is some scope to penalizing badly configured 
systems - if time and circumstances allow. Accepting crap often means 
condoning it - and encouraging systems administrators in sloppy 
practices. Of course, if you can find the time to do this - and not end 
up inconveniencing your own users too much :-)


Generally if emails come from poorly configured servers and they are 
relatively small providers or organisations, I try and liaise with them 
and get them to implement better settings. Fortunately I can do this as 
most of the setups at my end are relatively small - but in larger ones 
that is probably not possible.


For larger providers and domains at the sending end, sometimes I have to 
implement local workarounds and whitelists - as there isn't usually much 
chance to get any cooperation from them.


I believe (but I could be wrong) that the envelope from address should 
be able to receive bounce messages - so I don't think an address of the 
type www-data@server_hostname is acceptable.


Also, I found that most noreply@ type of addresses from clued-up 
providers seem to react correctly to callout verifications and confirm 
the address is real and valid (although they might return a bounceback 
message if you actually try to email them). I think this should be the 
correct way to configure noreply@ addresses. The exception to this is 
pretty much all Microsoft controlled domains and systems - which seem to 
be rubbish at both following standards and also configuring a decent 
email setup. Hence why I have to have a local whitelist and skip 
verification for all MX's of the form *.outlook.com (which include 
Microsoft cloud hosted domains).


Re: Check for valid MX of sender and rspamd testing

2018-04-09 Thread Sebastian Arcus


On 09/04/18 15:24, David Jones wrote:
I was wondering if anyone knows of an SA plugin or another method to 
determine if the envelope-from domain has a valid MX record that is 
listening on TCP port 25.  I don't think it would be a major scorer but 
it could be useful in meta rules.


This might not really answer your question, but I've had really good 
results leaving all this to the MTA (Exim in my case). I actually go for 
the whole hog full callout verification - checking with the MX that the 
sender really exists. I know that some people are against this and say 
that you get blacklisted - but I've been doing this for about 8 months 
on 4 sites and it has worked very well. I have a local full callout 
verification whitelist - to skip callout verification mainly for 
Microsoft operated domains - which will blacklist you at the drop of the 
hat. Pretty much everybody else on the internet seems to understand the 
full callout verification has more advantages than disadvantages in 
fighting spam. I also use Exim to keep count of how many callout 
verifications have failed for an origin IP address and then start 
rejecting connections after 10/24 hours - to stop spammers from using my 
boxes as dictionary attacks proxies against other domains (and getting 
me blacklisted in the process).


All of this seems to have worked out very well so far - but I realise 
that it will depend on the size of the email system and number of 
mailboxes and all sorts of other things - so it might not work so well 
elsewhere.


Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

2018-04-09 Thread Sebastian Arcus


On 08/04/18 13:41, David Jones wrote:

On 04/07/2018 10:42 AM, Sebastian Arcus wrote:
I'm not entirely sure what is the cause of this - notification emails 
from The Pension Regulator in UK (a government body overseeing 
pensions) have the destination email in upper case as part of the 
Message-ID. I don't know if the user has input their email address in 
caps when creating the account with TPR, and the system at TPR just 
preserves caps - or maybe their email software does that on purpose 
somehow. In all events, all email notifications from them go straight 
to the Junk folder. Do the standards really require a message id to be 
in all lower case?


I've enclosed one of the messages received here:

https://pastebin.com/9Bmu3pj1


I added this to the 60_whitelist_auth.cf to trust this sender:

def_whitelist_auth *@*.tpr.gov.uk

This will get pushed out in a couple of days by sa-update.

I know it's not directly addressing your question about the rule's high 
score but this is how I address these types of issues.  If you create a 
"fast lane" for trusted senders then this allows for more aggressive 
tactics/scores for new and untrusted senders.


Thank you David. It sounds like a reasonable solution to me.


Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

2018-04-08 Thread Sebastian Arcus


On 07/04/18 21:20, Bill Cole wrote:

On 7 Apr 2018, at 11:42 (-0400), Sebastian Arcus wrote:


Do the standards really require a message id to be in all lower case?


Of course not, and that's also not an accurate description of 
MSGID_SPAM_CAPS.


A small minority of rules in SA are based on any external standard. They 
are empirical and pragmatic, not legalistic. There is a complex analysis 
of multiple mail streams  used to generate scores for the rules and to 
decide which rules are good enough to publish in updates, run on a daily 
basis because it takes most of a day to run. The fact that 
MSGID_SPAM_CAPS exists with that name (and mot with a 'T_' or 
developer's tag prefix) implies that at some point in the past it was 
reliable enough as an indicator of spam to be part of the default set.


Thank you Bill. That is useful to know.


Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

2018-04-08 Thread Sebastian Arcus


On 07/04/18 17:22, Antony Stone wrote:

On Saturday 07 April 2018 at 18:10:18, Sebastian Arcus wrote:


On 07/04/18 16:52, Reindl Harald wrote something.



Thank you for answering, but really, in effect you haven't answered at
all my question.



And the way I customise the scores are based on the type of emails
received at this particular site. It might seem "idiotic" to you, but
there are reasons for those scores. Not everyone receives the same mix
of email - so it isn't constructive to start calling other people's
scoring "idiotic" just because they are not the same as your own or the
defaults.


Please note that there are good reasons why you received only a private
response from this person, and that he is no longer permitted to post to the
list.

My personal recommendation is to consider carefully anything he says, judge
whether you find it useful, and not to reply.


Hi Antony. Thank you kindly for the information. I didn't notice that 
the message was private and not from the list - as the message CC'ed the 
list - so it looked like a regular reply. I will take your advice - 
thank you.




Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

2018-04-08 Thread Sebastian Arcus


On 07/04/18 17:14, Reindl Harald wrote:



Am 07.04.2018 um 18:10 schrieb Sebastian Arcus:

And the way I customise the scores are based on the type of emails
received at this particular site. It might seem "idiotic" to you, but
there are reasons for those scores. Not everyone receives the same mix
of email - so it isn't constructive to start calling other people's
scoring "idiotic" just because they are not the same as your own or the
defaults

if a single misfired rule make a BAYES_00 message to a spam message it's
idiotic - it's that easy - with or without MSGID_SPAM_CAPS that can
happen at every moment in time and when you trust your bayes -0.2 is not
justified and if you don't trust your bayes train it


A default score of 3.1 for MSGID_SPAM_CAPS is pretty high - even 
compared with some of the DNS blacklists rules - and some of those are 
pretty powerful INMHO. Hence why I was trying to understand why this 
rule is assigned such a high score and what is the significance of it.


Secondly, I found in the past that a high negative score for BAYES_00 is 
counter-productive, because:


1. As soon as you receive a spam message with a new type of content, it 
essentially has a free ride until it gets put through the bayes training 
- as the high negative on BAYES_00 counteracts any other rule it hits - 
even pretty effective rules, such as Pyzor and blacklists.


2. Spammers have learned from the above, and I get a lot of spam which 
changes the wording all the time, so that bayes becomes essentially 
ineffective against it - but at the same time it stops other rules from 
working - because of the high negative scores on low BAYES.


3. Spammers have also learned from no.1 , and I see a lot of extremely 
short spam messages - just one short line of few words. Bayes seems to 
be extremely ineffective on these very short messages, not matter how 
much you train it - because of the small amount of data to work on, and 
with a little bit of cunning and varying the words used - they all score 
as BAYES_00. Again, the high negative score gives these spammers a 
guaranteed free ride, as it overrides any other rules.


So at least from the type of spam that I see, BAYES_00 with a large 
negative score is really counter-productive and it makes SA far less 
efficient at picking spam.


BAYES_00 doesn't necessarily mean "I am sure this is not spam" - as a 
good quality whitelist rule would, for example. It merely means "I 
haven't really seen this type of spam before", or simply "this message 
is too short and I really can't say anything useful about it". For these 
reasons, I don't think low BAYES scores should be given large negative 
scores - and hence why I changed them on my systems - with really good 
results.


Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

2018-04-07 Thread Sebastian Arcus


On 07/04/18 16:52, Reindl Harald wrote:

Content analysis details:   (5.1 points, 4.0 required)

who did set the *non default* required score to 4.0?
why did the person not adjust -0.2 for BAYES_00 too?

the scoring of this system is idiotic!

required score here is 5.5 and BAYES_00 is scored to -3.5 while milter
reject starts with 8.0 so nothing would happen just because *one single*
rule hti wrongly


Thank you for answering, but really, in effect you haven't answered at 
all my question. I was merely trying to understand the MSGID_SPAM_CAPS 
rule - and what rationale it is based on. I know I can alter the score 
just for it - I was trying to understand what other implications this 
might have. I didn't even suggest that SA default config or scoring 
needs to change!


And the way I customise the scores are based on the type of emails 
received at this particular site. It might seem "idiotic" to you, but 
there are reasons for those scores. Not everyone receives the same mix 
of email - so it isn't constructive to start calling other people's 
scoring "idiotic" just because they are not the same as your own or the 
defaults.





Am 07.04.2018 um 17:42 schrieb Sebastian Arcus:

I'm not entirely sure what is the cause of this - notification emails
from The Pension Regulator in UK (a government body overseeing pensions)
have the destination email in upper case as part of the Message-ID. I
don't know if the user has input their email address in caps when
creating the account with TPR, and the system at TPR just preserves caps
- or maybe their email software does that on purpose somehow. In all
events, all email notifications from them go straight to the Junk
folder. Do the standards really require a message id to be in all lower
case?

I've enclosed one of the messages received here:

https://pastebin.com/9Bmu3pj


MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

2018-04-07 Thread Sebastian Arcus
I'm not entirely sure what is the cause of this - notification emails 
from The Pension Regulator in UK (a government body overseeing pensions) 
have the destination email in upper case as part of the Message-ID. I 
don't know if the user has input their email address in caps when 
creating the account with TPR, and the system at TPR just preserves caps 
- or maybe their email software does that on purpose somehow. In all 
events, all email notifications from them go straight to the Junk 
folder. Do the standards really require a message id to be in all lower 
case?


I've enclosed one of the messages received here:

https://pastebin.com/9Bmu3pj1


Re: FUZZY_XPILL FP hitting all Travelodge emails

2018-04-02 Thread Sebastian Arcus


On 02/04/18 14:58, RW wrote:

On Mon, 2 Apr 2018 08:26:27 -0500
David Jones wrote:


On 04/02/2018 07:18 AM, Sebastian Arcus wrote:

Thank you - one example here: https://pastebin.com/UGStfCys


It found "xon, OX" in "Aylesbury Road, Thame, Oxon, OX9 3AT"

It's an aggressive rule that finds anything that might be an
obfuscated Xanax. It only scores 0.8 points because it can produce FPs
like this.


Actually that is my private, custom score. I think the default is 2.8 or 
something like that.


Re: FUZZY_XPILL FP hitting all Travelodge emails

2018-04-02 Thread Sebastian Arcus


On 02/04/18 14:26, David Jones wrote:

On 04/02/2018 07:18 AM, Sebastian Arcus wrote:

Thank you - one example here: https://pastebin.com/UGStfCys


On 02/04/18 13:10, Kevin A. McGrail wrote:

Pastebin a sample(s).

On Mon, Apr 2, 2018, 08:06 Sebastian Arcus <s.ar...@open-t.co.uk 
<mailto:s.ar...@open-t.co.uk>> wrote:


    I have a client which handles a lot of hotel bookings as part of 
their
    work - and all hotel booking confirmations coming from Travelodge 
(a UK

    hotel chain) hit FUZZY_XPILL.

    I've tried looking at the regex of the rule, but can't quite get 
my head
    around what it is supposed to do, and can't figure out why it 
triggers
    on all the Travelodge emails either. Could anybody provide some 
hints -
    or have others seen this as well? I can provide some sample mail, 
if it

    helps. Thank you.



I have added an entry to 60_whitelist_auth.cf to help with this in all 
SA instances that run sa-update regularly.  This will be out there in a 
couple of days trusting email from that sender when there is an SPF_PASS 
or DKIM_VALID_AU hit.


def_whitelist_auth *@travelodge.co.uk

These emails from Travelodge are important enough to be DKIM signed as 
well for http://dkimwl.org which I would eventually like to get added to 
the default SA ruleset.


Thank you very much for the fix and for the quick replies.


Re: FUZZY_XPILL FP hitting all Travelodge emails

2018-04-02 Thread Sebastian Arcus

On 02/04/18 13:35, Pedro David Marco wrote:

Sebastian,

can you run

spamassassin -D -t &1 | grep got | grep FUZZY_XPILL


and post the result, please?



Hi Pedro. Please find the output below:

Apr  2 15:45:59.961 [6928] dbg: rules: ran body rule FUZZY_XPILL ==> 
got hit: "xon, OX"


Re: FUZZY_XPILL FP hitting all Travelodge emails

2018-04-02 Thread Sebastian Arcus

Thank you - one example here: https://pastebin.com/UGStfCys


On 02/04/18 13:10, Kevin A. McGrail wrote:

Pastebin a sample(s).

On Mon, Apr 2, 2018, 08:06 Sebastian Arcus <s.ar...@open-t.co.uk 
<mailto:s.ar...@open-t.co.uk>> wrote:


I have a client which handles a lot of hotel bookings as part of their
work - and all hotel booking confirmations coming from Travelodge (a UK
hotel chain) hit FUZZY_XPILL.

I've tried looking at the regex of the rule, but can't quite get my head
around what it is supposed to do, and can't figure out why it triggers
on all the Travelodge emails either. Could anybody provide some hints -
or have others seen this as well? I can provide some sample mail, if it
helps. Thank you.



FUZZY_XPILL FP hitting all Travelodge emails

2018-04-02 Thread Sebastian Arcus
I have a client which handles a lot of hotel bookings as part of their 
work - and all hotel booking confirmations coming from Travelodge (a UK 
hotel chain) hit FUZZY_XPILL.


I've tried looking at the regex of the rule, but can't quite get my head 
around what it is supposed to do, and can't figure out why it triggers 
on all the Travelodge emails either. Could anybody provide some hints - 
or have others seen this as well? I can provide some sample mail, if it 
helps. Thank you.


Re: BODY custom rule not working if text and html parts are different?

2018-04-02 Thread Sebastian Arcus


On 01/04/18 19:18, John Hardin wrote:

On Sun, 1 Apr 2018, John Hardin wrote:


On Sun, 1 Apr 2018, Matus UHLAR - fantomas wrote:


On 01.04.18 05:47, Pedro David Marco wrote:

This is a problem i see oftenly...
what if the URL is only in the TEXT part  and not in the HTML?  many 
email aplications show those URLs as clickable as if they were valid 
HTML HREFs when they are not...


in this case, body rule matches, but uri does not.


I think there are hueristics to pull (non-obfuscated) URIs out of body 
text.


Yeah, just confirmed. A non-obfuscated URI in plain-text body part is 
recognized and extracted for uri rules.


That's great - thank you for testing this out and letting us know.


Re: BODY custom rule not working if text and html parts are different?

2018-04-01 Thread Sebastian Arcus


On 01/04/18 07:10, Matus UHLAR - fantomas wrote:

On 01.04.18 05:47, Pedro David Marco wrote:

This is a problem i see oftenly...
what if the URL is only in the TEXT part  and not in the HTML?  many 
email aplications show those URLs as clickable as if they were valid 
HTML HREFs when they are not...


in this case, body rule matches, but uri does not.


I wonder if RAWBODY would match the url both in the text part and in the 
html part? Does anybody know?


Re: BODY custom rule not working if text and html parts are different?

2018-03-31 Thread Sebastian Arcus


On 31/03/18 22:39, John Hardin wrote:

On Sat, 31 Mar 2018, Sebastian Arcus wrote:

I have a really simple rule looking for custom text string contained 
in spam urls in the body of the email, like so:


body  SHORT_BITCOIN_DATING    /specific_string_here/i
score SHORT_BITCOIN_DATING    3.0
describe  SHORT_BITCOIN_DATING    Body URL signature of spam

I just realised that it is only working if the URL exists in both the 
text and html versions. If the text version doesn't have the url, it 
isn't working. Do "body" rules only work on the html part of the 
message? I've tried searching through the documentation, but I can't 
see that being the case. Maybe there is something else having an 
effect here?


"body" includes the *rendered* part of HTML. If the URL only appears 
within  in the HTML part then "body" will not see it.


If you are looking for URLs, you should probably be using a "uri" rule. 
There are heuristics to pull those out of the body text, as well out of 
HTML tags.


Thank you for the suggestions - much appreciated. As my original rule 
worked initially, I didn't realise the subtle difference between using 
BODY and URI rules. It is working fine now. Thank you again!


BODY custom rule not working if text and html parts are different?

2018-03-31 Thread Sebastian Arcus
I have a really simple rule looking for custom text string contained in 
spam urls in the body of the email, like so:


body  SHORT_BITCOIN_DATING/specific_string_here/i
score SHORT_BITCOIN_DATING3.0
describe  SHORT_BITCOIN_DATINGBody URL signature of spam

I just realised that it is only working if the URL exists in both the 
text and html versions. If the text version doesn't have the url, it 
isn't working. Do "body" rules only work on the html part of the 
message? I've tried searching through the documentation, but I can't see 
that being the case. Maybe there is something else having an effect here?


Many thanks for any hints.


Re: T_DKIM_INVALID false positives with Gmail

2018-03-19 Thread Sebastian Arcus

On 19/03/18 15:53, Bill Cole wrote:

On 19 Mar 2018, at 11:29, Sebastian Arcus wrote:

I've been seeing a number of false positives recently from 
T_DKIM_INVALID with Gmail emails. Are some Gmail servers 
misconfigured, or could something be going on at my end? The DKIM 
record which is flagged as invalid is below:


DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; 
d=googlemail.com; s=20161025; 
h=mime-version:from:date:message-id:subject:to;bh=8wlgvdpEOmUO2ugslPxRkFYA/ZThwu2bWy5VmlR76ug=; 

b=gRcnOIzmENqS8a91mSdETdXvyH6df7u0tSwsadk6CMD0KtAbzuM3ojHW+kPEo7AB1i 
 vnbCDc/vsR6H7pP0k3hZmF7z/dAaeZWD4RVzqM+Fv70oHy4af64j+fGSekOCM9o4ShRQ
Vk3KyF+69sKTK3rRWEnfrcgi/pN2DJWDvrIBRjmFOZYKNVN+8elaVM9DOO7tEMLYuw7T 
+sVaUMNt8MuPxRhrskJYOIxK8zzkcJHYV+1TuWJuqZAHRVwgnDWX7q3Wx0GwrX+3lKpm 
   3A1+F5dBVjH4dXvdfIESm5XpV8b9uBn9daGWrUgkR+PB23XsL9QkxEqCRXdgII3FRxtQ

Ps6A==


There are LOTS of ways to break a DKIM signature. Whether that one is 
broken can't be checked and how it might have been broken can't be 
guessed at without the full *unmodified* headers and body of the message.


I use Exim to pass stuff directly to SA. Could I attach the DKIM header 
in a text file and send it to the list?


T_DKIM_INVALID false positives with Gmail

2018-03-19 Thread Sebastian Arcus
I've been seeing a number of false positives recently from 
T_DKIM_INVALID with Gmail emails. Are some Gmail servers misconfigured, 
or could something be going on at my end? The DKIM record which is 
flagged as invalid is below:


DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; 
s=20161025; 
h=mime-version:from:date:message-id:subject:to;bh=8wlgvdpEOmUO2ugslPxRkFYA/ZThwu2bWy5VmlR76ug=; 

b=gRcnOIzmENqS8a91mSdETdXvyH6df7u0tSwsadk6CMD0KtAbzuM3ojHW+kPEo7AB1i 
   vnbCDc/vsR6H7pP0k3hZmF7z/dAaeZWD4RVzqM+Fv70oHy4af64j+fGSekOCM9o4ShRQ 

Vk3KyF+69sKTK3rRWEnfrcgi/pN2DJWDvrIBRjmFOZYKNVN+8elaVM9DOO7tEMLYuw7T 
  +sVaUMNt8MuPxRhrskJYOIxK8zzkcJHYV+1TuWJuqZAHRVwgnDWX7q3Wx0GwrX+3lKpm 
 3A1+F5dBVjH4dXvdfIESm5XpV8b9uBn9daGWrUgkR+PB23XsL9QkxEqCRXdgII3FRxtQ

Ps6A==


Re: Extremely persistent sex/make money spam with very little text in the body

2018-03-07 Thread Sebastian Arcus


On 07/03/18 11:25, Leandro wrote:
2018-03-07 5:52 GMT-03:00 Sebastian Arcus <s.ar...@open-t.co.uk 
<mailto:s.ar...@open-t.co.uk>>:



6. The links they include in the body of the email are almost never
flagged up either by Clam or Spamassassin - and they point to a
different domain in every single message.


Although they use multiple domains in the URLs at body, many of these 
URLs are addressed to the same IPv4/IPv6 address or IP ranges, that is 
just one shared web server or a group of shared web servers of the spammer.


The key to solving this problem is that you all start to cross the data 
and start scoring the URL host IP, that is the exact fiscal place they 
want to you visit even fired by many hacked mail servers at world and 
many distinct domains. The mail services and domains are very disperse 
but the web servers are very concentrated.


As far as I can tell, the URL's in the spam I see point to php scripts 
on various compromised servers - which, maybe, further redirect to the 
final payment servers. But thank you for the suggestion - I will keep an 
eye on it.


Re: Extremely persistent sex/make money spam with very little text in the body

2018-03-07 Thread Sebastian Arcus

On 07/03/18 09:08, Daniele Duca wrote:

On 07/03/2018 09:52, Sebastian Arcus wrote:

I have this one email account receiving, for more than a year, a very 
specific type of spam which I find very difficult to block:


1. The messages are all kept very short, generally below 20 words - I 
assume so that Bayes is less efficient at classifying them?


2. Although they are all invitations to sex, or making money - they 
are phrased differently every time and use different words - so Bayes 
scores are consistently low.



Hi Sebastian,

I perfectly know what type of email you are talking about, I've seen 
them written at least in italian, english and spanish. If you click the 
link you are being redirected to shady dating websites or 
bitcoin/investment scams sites (at least in my experience).


Since I get the majority of these emails in italian, I've written a meta 
rule that takes in account:


- Common mispelled words/phrases
- Body lines must be < 5
- The common pattern in all the urls. Take a close look at them, there 
IS a pattern, not writing it here for obvious reasons :)


Thank you so much for that! The emails I see don't usually have spelling 
mistakes, but you are right, it seems that the url is the way to go. 
I've been looking for patters in the headers and source servers all 
along - it never crossed my mind to check the body! Thanks again


Extremely persistent sex/make money spam with very little text in the body

2018-03-07 Thread Sebastian Arcus
I have this one email account receiving, for more than a year, a very 
specific type of spam which I find very difficult to block:


1. The messages are all kept very short, generally below 20 words - I 
assume so that Bayes is less efficient at classifying them?


2. Although they are all invitations to sex, or making money - they are 
phrased differently every time and use different words - so Bayes scores 
are consistently low.


3. They come from servers all around the world - possibly compromised, 
or maybe quickly setup and taken down - so they are usually not flagged 
by blacklists


4. Pyzor tends to flag most of them up though.

5. In most cases, DKIM is correct, SPF is fine, and the headers are all 
correct - so they don't hit any other rules.


6. The links they include in the body of the email are almost never 
flagged up either by Clam or Spamassassin - and they point to a 
different domain in every single message.


The bizarre thing is that I only see them coming to this one particular 
email account, at a single domain of all the ones I administer. Based on 
the above whoever sends them really know what they are doing, and must 
have significant resources at their disposal - but I still have no idea 
why they only hit this particular email address. I can only assume that 
greylisting wouldn't help much, as they seem to arrive from properly 
configured smpt servers, which would retry like any other regular smtp 
server and bypass greylisting. Has anybody else seen these, and is there 
anything else that I could try to block them?


Re: IADB whitelist - again

2018-03-02 Thread Sebastian Arcus


On 01/03/18 19:50, David Jones wrote:

On 03/01/2018 12:29 PM, Sebastian Arcus wrote:
I know I have brought up this issue on this list before, and sorry for 
the persistence, but having 7 different rules adding scores for the 
IADB whitelist still seems either ridiculous, or outright suspect:


-0.2 RCVD_IN_IADB_RDNS  RBL: IADB: Sender has reverse DNS record
  [199.127.240.84 listed in iadb.isipp.com]
-0.1 RCVD_IN_IADB_SPF   RBL: IADB: Sender publishes SPF record
-0.1 RCVD_IN_IADB_OPTIN RBL: IADB: All mailing list mail is opt-in
-0.0 RCVD_IN_IADB_SENDERID  RBL: IADB: Sender publishes Sender ID record
-0.0 RCVD_IN_IADB_LISTED    RBL: Participates in the IADB system
-0.1 RCVD_IN_IADB_DK    RBL: IADB: Sender publishes Domain Keys 
record

-0.1 RCVD_IN_IADB_VOUCHED   RBL: ISIPP IADB lists as vouched-for sender


It really raises some very uncomfortable questions regarding the 
impartiality of SA and/or its anti-spam capabilities. And by the way, 
this message is definitely unsolicited, and in now way we gave any 
sort of permission or consent to this company or its "affiliates" to 
email us - so the whole "All mailing list mail is opt-in" is nonsense.


And why have "Sender has reverse DNS record" and "Sender publishes SPF 
record" as separate IADB rules - when SA itself already checks for 
these? Isn't this just a glaring way of pumping up SA scores for the 
IADB subscribers?


Once in a while, even the best senders can get a bad customer of theirs 
that obtained email addresses by a violation of their terms and conditions.


Just block that sender with a local "blacklist_from *@example.com" entry 
and report it to SpamCop.  If the message headers have any abuse 
reporting information then send the headers there too.  They should do 
their own internal investigation and shutdown that bad customer of theirs.


That is still beside the point. There is simply no reason in the 
interest of SA as an antispam solution to publish all those rules. One 
or two rules would be more than enough. I know I can block this and that 
in SA, and tweak rules all the time - but I am concerned when the 
default settings in SA effectively facilitate marketing companies to 
stuff my Inbox full of junk. In that case you would achieve better 
results not using SA at all. As to reporting bad senders and "internal 
investigation" - my experience shows that doesn't get very far with any 
providers.


Re: IADB whitelist - again

2018-03-02 Thread Sebastian Arcus


On 01/03/18 19:04, John Hardin wrote:

On Thu, 1 Mar 2018, Sebastian Arcus wrote:

I know I have brought up this issue on this list before, and sorry for 
the persistence, but having 7 different rules adding scores for the 
IADB whitelist still seems either ridiculous, or outright suspect:


-0.2 RCVD_IN_IADB_RDNS  RBL: IADB: Sender has reverse DNS record
    [199.127.240.84 listed in iadb.isipp.com]
-0.1 RCVD_IN_IADB_SPF   RBL: IADB: Sender publishes SPF record
-0.1 RCVD_IN_IADB_OPTIN RBL: IADB: All mailing list mail is opt-in
-0.0 RCVD_IN_IADB_SENDERID  RBL: IADB: Sender publishes Sender ID record
-0.0 RCVD_IN_IADB_LISTED    RBL: Participates in the IADB system
-0.1 RCVD_IN_IADB_DK    RBL: IADB: Sender publishes Domain Keys 
record

-0.1 RCVD_IN_IADB_VOUCHED   RBL: ISIPP IADB lists as vouched-for sender


It really raises some very uncomfortable questions regarding the 
impartiality of SA and/or its anti-spam capabilities. And by the way, 
this message is definitely unsolicited, and in now way we gave any 
sort of permission or consent to this company or its "affiliates" to 
email us - so the whole "All mailing list mail is opt-in" is nonsense.


And why have "Sender has reverse DNS record" and "Sender publishes SPF 
record" as separate IADB rules - when SA itself already checks for 
these? Isn't this just a glaring way of pumping up SA scores for the 
IADB subscribers?


Don't assume malice right off the bat. More likely it is that IADB 
provides all those status codes and SA exposes a rule for each, with 
minimal scores, to allow local tuning if desired.


But why does SA have to expose a rule for each and every code IADB 
provides? SA is an antispam solution, IADB is a facilitator for the 
marketing industry (in spite of their continuous protestations on this 
list). The goals of the two are not the same. Surely SA can decide by 
itself what is really useful from a spam filtering point of view - not 
churn out whatever it gets passed by marketing whitelists? SA uses other 
whitelists (some may I say a lot more useful than IADB), and it only 
exposes one or two rules for each.




Also, there is RCVD_IN_IADB_DOPTIN, so RCVD_IN_IADB_OPTIN may be 
"someone somehow gave us your name somewhere" (i.e. "single opt-in") 
rather than "we confirmed you actually want to receive our garbage" 
("double opt-in").


So effectively pretty useless, as if you ever made the mistake of 
forgetting to untick the "receive email from our carefully selected 
partners" in the past, you will never be able to take that consent back 
as your email address gets passed from entity to entity. Consent to be 
emailed marketing material is a joke - and SA shouldn't be a facilitator 
- otherwise its value as a spam filter is gone.




The scores appear hardcoded (50_scores.cf) vs. from masscheck 
(72_scores.cf) so they may be *very* stale.


In that case maybe at least some of the rules should be removed then


IADB whitelist - again

2018-03-01 Thread Sebastian Arcus
I know I have brought up this issue on this list before, and sorry for 
the persistence, but having 7 different rules adding scores for the IADB 
whitelist still seems either ridiculous, or outright suspect:


-0.2 RCVD_IN_IADB_RDNS  RBL: IADB: Sender has reverse DNS record
 [199.127.240.84 listed in iadb.isipp.com]
-0.1 RCVD_IN_IADB_SPF   RBL: IADB: Sender publishes SPF record
-0.1 RCVD_IN_IADB_OPTIN RBL: IADB: All mailing list mail is opt-in
-0.0 RCVD_IN_IADB_SENDERID  RBL: IADB: Sender publishes Sender ID record
-0.0 RCVD_IN_IADB_LISTEDRBL: Participates in the IADB system
-0.1 RCVD_IN_IADB_DKRBL: IADB: Sender publishes Domain Keys record
-0.1 RCVD_IN_IADB_VOUCHED   RBL: ISIPP IADB lists as vouched-for sender


It really raises some very uncomfortable questions regarding the 
impartiality of SA and/or its anti-spam capabilities. And by the way, 
this message is definitely unsolicited, and in now way we gave any sort 
of permission or consent to this company or its "affiliates" to email us 
- so the whole "All mailing list mail is opt-in" is nonsense.


And why have "Sender has reverse DNS record" and "Sender publishes SPF 
record" as separate IADB rules - when SA itself already checks for 
these? Isn't this just a glaring way of pumping up SA scores for the 
IADB subscribers?


Re: Spamassassin DNS problems

2018-01-15 Thread Sebastian Arcus


On 10/01/18 12:14, ter...@web.de wrote:

Hi.
I found your spamassassin problem while looking for answers to my problem:
http://spamassassin.1065346.n5.nabble.com/Dns-Blocklists-always-returning-0-records-td124564.html

It seems the problem you have/had is exactly the problem I have!
Sadly there is no solution in the thread.
Did you manage to find a solution?


Hi. I did reply to the list at the time in case others would find the 
information useful. For some strange reason, the SA mailing list archive 
doesn't seem to include my last reply. I've enclosed it below:



On 17/05/17 18:11, Sebastian Arcus wrote:

Just a follow-up and clarification on this issue - after more testing, 
it seems that it was the Spamassassin version which was the problem. I 
have had to upgrade SA on 7 servers running 3.4.1 on Slackware - as the 
dns rbl's weren't working on any of them. The only server I had with SA 
3.4.0 *was* actually working correctly. After upgrading all the boxes to 
4.0.0, the dns rbl's are now working correctly. I have *not* changed any 
configuration options in SA - I left all the servers as they were in 
this respect - so it seems it was not a configuration issue.


I'm afraid I haven't been able to narrow it down further than this. The 
servers were all running various kernels, both x86 and x86_64 
architectures, and several different versions of Perl - so I would guess 
the SA version was the common factor and the likely culprit.


Re: IADB whitelist

2017-12-26 Thread Sebastian Arcus

On 25/12/17 23:57, Bill Cole wrote:

On 25 Dec 2017, at 3:28 (-0500), Sebastian Arcus wrote:

Also, any idea why are there 6 different rules associated with this 
particular whitelist?


IADB has many independent return codes that each have distinct meaning. 
See 
http://www.isipp.com/email-accreditation/about-the-codes/list-of-codes/ 
for details.


If you get mail from an IADB-listed sender that you are 100% sure is 
spam (i.e. not "I would never ask for such mail" but "the recipient 
absolutely did not consent to receiving this mail.") then you should 
report that to ISIPP. "ab...@suretymail.com" is the reporting address 
listed on their website and while I've not had cause to use it, people I 
trust with no reason to lie say that reports to that address do actually 
work to either change sender behavior or eliminate listings. Anne 
Mitchell (head of ISIPP) is an ex-coworker of mine whose integrity and 
dedication to the anti-spam fight (which is dependent on keeping 
*wanted* mail deliverable) I can personally vouch for.


However, the different responses from IADB are VERY nuanced and the two 
strongest rules you listed (RCVD_IN_IADB_OPTIN and RCVD_IN_IADB_VOUCHED) 
are essentially "good intentions" markers. Due to unfortunate 
terminology choices by ISIPP and a willingness to engage in nuance and 
estimate intentions, those aren't really as worthwhile as they might 
seem. The IADB definition of "All mailing list mail is opt-in" is 
(effectively) "we believe that this ESP believes in good faith that 
every recipient has chosen to receive this mail." Their "vouching" for a 
record is an assertion that either the ESP is personally known to ISIPP 
staff as competent and honest OR has maintained stable positive listings 
for >6 months. I'm pretty sure I don't want ANY score for a non-vouched 
record and unlike ISIPP (and some valuable SA contributors!) I really 
don't care much about ESPs' intentions or responsiveness to complaints, 
only about actual spamming behavior. So I have made substantial 
modification on my own system to how IADB results are scored, but those 
specific adjustments are probably not fit for most other sites.


Thank you for a detailed reply. Like you as well, I don't put much 
weight on what ESP's say they do or intend to do. I'm afraid the email 
marketing industry is rather murky and the line between legitimate 
marketing and spamming is often pretty much non-existent - with 
apologies to those few operators who actually run an honest operation. I 
see daily examples of supposedly legit operators who don't actually act 
on unsubscribe requests, or 'magically' re-subscribe after a while, or 
simply get around rules by creating a new list and re-subscribing 
everybody who unsubscribed. And frankly, the whole issue of consent is 
blurred beyond any usefulness. If you have ever made the mistake of 
leaving the tick box selected for "receive offers from our carefully 
selected partners", it is virtually impossible to take that consent 
back, as your email address gets passed from database to database, never 
to be removed again. Besides, with most people purchasing things from so 
many different sources, and creating accounts on so many websites, how 
many would actually be able to say for sure (and prove it) that they 
never gave consent to be emailed by "carefully selected partners"? So 
you will excuse me if I take any whitelist which helps marketing 
emailing lists "improve deliverability" with a very big dollop of salt.


Re: IADB whitelist

2017-12-25 Thread Sebastian Arcus


On 25/12/17 10:45, Reindl Harald wrote:



Am 25.12.2017 um 09:28 schrieb Sebastian Arcus:

On 23/12/17 10:01, Kevin A. McGrail wrote:
The 1st step is that a representaive of the rbl asks us to consider 
for inclusion.


Thank you. If enough people receive spam sanctioned by a particular 
whitelist, will the minus scores associated with their rule(s) be 
reduced over time?


maybe, but why not just override the score in local.cf

/etc/mail/spamassassin/local-*.cf
score RCVD_IN_IADB_DK -0.3
score RCVD_IN_IADB_DOPTIN -1.0
score RCVD_IN_IADB_DOPTIN_GT50 -0.5
score RCVD_IN_IADB_DOPTIN_LT50 -0.1
score RCVD_IN_IADB_LISTED -0.001
score RCVD_IN_IADB_ML_DOPTIN -2.5
score RCVD_IN_IADB_OPTIN -0.05
score RCVD_IN_IADB_OPTIN_GT50 -0.2
score RCVD_IN_IADB_OPTIN_LT50 -0.1
score RCVD_IN_IADB_RDNS -0.05
score RCVD_IN_IADB_SENDERID -0.5
score RCVD_IN_IADB_SPF -0.1
score RCVD_IN_IADB_VOUCHED -2.0


I know I can override the scores for all sorts of things in local.cf. 
The reason I was raising the question is because I was wondering if 
whitelists can be used by unscrupulous marketing organisations to 
effectively undo what is one of the main functions of SA - to reduce or 
stop unsolicited email.




Also, any idea why are there 6 different rules associated with this 
particular whitelist?


these are 6 different lists, just read the description you even posted 
on the right side of the score


Well, they might be technically 6 different lists, but IADB is one 
single entity, and including 6 different whitelists from them only looks 
like a way to reduce the SA score for email from their "certified" 
senders further. After all SA already checks separately for things like 
RDNS, DKIM, SPF.







On December 23, 2017 3:03:26 AM EST, Sebastian Arcus 
<s.ar...@open-t.co.uk> wrote:


    What is the process of including whitelists in SA default 
configs? It is

    not the first time I see pretty obvious mailing list spam which has
    quite high minus scores from 2-3 whitelists included in SA:

    -1.5 RCVD_IN_IADB_OPTIN RBL: IADB: All mailing list mail is 
opt-in
   [205.201.128.83 
<http://205.201.128.83>  listed iniadb.isipp.com 
<http://iadb.isipp.com>]
    -0.1 RCVD_IN_IADB_DK    RBL: IADB: Sender publishes Domain 
Keys record

    -0.2 RCVD_IN_IADB_RDNS  RBL: IADB: Sender has reverse DNS record
    -0.0 RCVD_IN_IADB_SENDERID  RBL: IADB: Sender publishes Sender ID 
record
    -2.2 RCVD_IN_IADB_VOUCHED   RBL: ISIPP IADB lists as vouched-for 
sender

    -0.1 RCVD_IN_IADB_SPF   RBL: IADB: Sender publishes SPF record
    -0.0 RCVD_IN_IADB_LISTED    RBL: Participates in the IADB system
    -0.0 RCVD_IN_IADB_OPTIN_GT50 RBL: IADB: Opt-in used more than 50% 
of the

    time


    For the same message, Pyzor has a high score - which is correct:

    2.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
   [cf: 100]
    2.5 RAZOR2_CHECK   Listed in Razor2 (http://razor.sf.net/)


Re: IADB whitelist

2017-12-25 Thread Sebastian Arcus

On 23/12/17 10:01, Kevin A. McGrail wrote:
The 1st step is that a representaive of the rbl asks us to consider for 
inclusion.


Thank you. If enough people receive spam sanctioned by a particular 
whitelist, will the minus scores associated with their rule(s) be 
reduced over time? Also, any idea why are there 6 different rules 
associated with this particular whitelist?





Regards,
KAM

On December 23, 2017 3:03:26 AM EST, Sebastian Arcus 
<s.ar...@open-t.co.uk> wrote:


What is the process of including whitelists in SA default configs? It is
not the first time I see pretty obvious mailing list spam which has
quite high minus scores from 2-3 whitelists included in SA:

-1.5 RCVD_IN_IADB_OPTIN RBL: IADB: All mailing list mail is opt-in
   [205.201.128.83 <http://205.201.128.83>  listed 
iniadb.isipp.com <http://iadb.isipp.com>]
-0.1 RCVD_IN_IADB_DKRBL: IADB: Sender publishes Domain Keys record
-0.2 RCVD_IN_IADB_RDNS  RBL: IADB: Sender has reverse DNS record
-0.0 RCVD_IN_IADB_SENDERID  RBL: IADB: Sender publishes Sender ID record
-2.2 RCVD_IN_IADB_VOUCHED   RBL: ISIPP IADB lists as vouched-for sender
-0.1 RCVD_IN_IADB_SPF   RBL: IADB: Sender publishes SPF record
-0.0 RCVD_IN_IADB_LISTEDRBL: Participates in the IADB system
-0.0 RCVD_IN_IADB_OPTIN_GT50 RBL: IADB: Opt-in used more than 50% of the
time


For the same message, Pyzor has a high score - which is correct:

2.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
   [cf: 100]
2.5 RAZOR2_CHECK   Listed in Razor2 (http://razor.sf.net/)



IADB whitelist

2017-12-23 Thread Sebastian Arcus
What is the process of including whitelists in SA default configs? It is 
not the first time I see pretty obvious mailing list spam which has 
quite high minus scores from 2-3 whitelists included in SA:


-1.5 RCVD_IN_IADB_OPTIN RBL: IADB: All mailing list mail is opt-in
 [205.201.128.83 listed in iadb.isipp.com]
-0.1 RCVD_IN_IADB_DKRBL: IADB: Sender publishes Domain Keys record
-0.2 RCVD_IN_IADB_RDNS  RBL: IADB: Sender has reverse DNS record
-0.0 RCVD_IN_IADB_SENDERID  RBL: IADB: Sender publishes Sender ID record
-2.2 RCVD_IN_IADB_VOUCHED   RBL: ISIPP IADB lists as vouched-for sender
-0.1 RCVD_IN_IADB_SPF   RBL: IADB: Sender publishes SPF record
-0.0 RCVD_IN_IADB_LISTEDRBL: Participates in the IADB system
-0.0 RCVD_IN_IADB_OPTIN_GT50 RBL: IADB: Opt-in used more than 50% of the 
time



For the same message, Pyzor has a high score - which is correct:

2.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
 [cf: 100]
2.5 RAZOR2_CHECK   Listed in Razor2 (http://razor.sf.net/)


Re: HTML_IMAGE_ONLY_* generating too many FP's

2017-12-05 Thread Sebastian Arcus


On 02/12/17 18:45, David Jones wrote:

On 12/02/2017 11:22 AM, Sebastian Arcus wrote:


On 02/12/17 13:06, Matus UHLAR - fantomas wrote:

On 12/01/2017 11:17 AM, Sebastian Arcus wrote:

-0.2 RCVD_IN_MSPIKE_H2  RBL: Average reputation (+2)
 [212.227.126.131 listed in 
wl.mailspike.net]
0.4 MIME_HTML_MOSTLY   BODY: Multipart message mostly 
text/html MIME
1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes 
of words

2.0 BAYES_50   BODY: Bayes spam probability is 40 to 60%
  [score: 0.4808]
0.8 MPART_ALT_DIFF BODY: HTML and text parts are different
0.0 HTML_MESSAGE   BODY: HTML included in message
2.5 PYZOR_CHECK    Listed in Pyzor (http://pyzor.sf.net/)
-0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at 
http://www.dnswl.org/, no

  trust
  [212.227.126.131 listed in 
list.dnswl.org]



On 01/12/17 10:54, Axb wrote:
you've changed SA default scores and now complain about one which 
hasn't been touched as cause for FPs?


compare the defaults with yours...
score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2
score BAYES_50  0  0  2.0    0.8

h maybe you should rethink those changes.


On 01.12.17 12:23, Sebastian Arcus wrote:
Indeed, I did amend some of the default SA scores, to catch more 
spam for the type of email received at this particular site. That 
doesn't change the fact that 1.6 seems to me a pretty high score for 
a rule which would be triggered on such a large number of ham 
emails. Just saying.


You should understand that when you start tuning scores, you can get 
to hell
very fast. unless you do your own mass-checks and tune according to 
them.


I'm not too sure I understand this attitude. The whole reason I 
started to tweak the scores for certain rules is that too much spam 
was going through. The false negatives have gone down considerably 
since I have altered the scores - and yes, I do keep an eye on them 
constantly and adjust depending on the number of false positive and 
negatives, and what triggers what. I also use network tests / RBL's as 
well and Bayes. The simple fact of the matter is that on plenty of 
spam emails, only one significant rule might get triggered - be it a 
high bayes score, one of the DNS RBL's or something else. If the rule 
doesn't have a high enough score, the email passes through.


Spammers change their tactics and content of their emails all the time 
- and the rule scores haven't been updated in months - because of the 
problems with the updating system (which is not a criticism - I 
understand the situation). So for people to advise sticking 
religiously to the default scores, well, frankly I don't get it.


The rulesets and dynamic scores in 72_scores.cf are updating again for 
the past 2 weeks.


I recommend only changing a few of the default scores and make meta 
rules that combine the hits to add points when you see a pattern of 2 or 
more rules being hit.


If you add enough add-ons to your SA instance, then you shouldn't be 
impacted too much by the default scores.  SA has to be generic out of 
the box to cover all types of mail flow.  You have to tune it a bit for 
your particular recipients, language, and location.  See my email 
moments ago about tuning suggestions.


I used to constantly adjust scores to react to new spam campaigns but 
found I was always behind the spammers.  The more RBLs and meta rules 
you can setup, the more you can stay ahead of them.  Compromised 
accounts are the exception to this with zero-hour spam that is very 
difficult to block so try to keep that separate in your mind and not 
chase after those with score adjustments. These tend to stop 
automatically after 30 minutes or so when RBLs and DCC catch up to them 
or the account gets locked or it's password changed.  I report these to 
Spamcop as quickly as I can.


Thank you David. Those are useful tips.


Re: HTML_IMAGE_ONLY_* generating too many FP's

2017-12-02 Thread Sebastian Arcus


On 02/12/17 13:06, Matus UHLAR - fantomas wrote:

On 12/01/2017 11:17 AM, Sebastian Arcus wrote:

-0.2 RCVD_IN_MSPIKE_H2  RBL: Average reputation (+2)
 [212.227.126.131 listed in 
wl.mailspike.net]
0.4 MIME_HTML_MOSTLY   BODY: Multipart message mostly text/html 
MIME
1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes 
of words

2.0 BAYES_50   BODY: Bayes spam probability is 40 to 60%
  [score: 0.4808]
0.8 MPART_ALT_DIFF BODY: HTML and text parts are different
0.0 HTML_MESSAGE   BODY: HTML included in message
2.5 PYZOR_CHECK    Listed in Pyzor (http://pyzor.sf.net/)
-0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at 
http://www.dnswl.org/, no

  trust
  [212.227.126.131 listed in 
list.dnswl.org]



On 01/12/17 10:54, Axb wrote:
you've changed SA default scores and now complain about one which 
hasn't been touched as cause for FPs?


compare the defaults with yours...
score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2
score BAYES_50  0  0  2.0    0.8

h maybe you should rethink those changes.


On 01.12.17 12:23, Sebastian Arcus wrote:
Indeed, I did amend some of the default SA scores, to catch more spam 
for the type of email received at this particular site. That doesn't 
change the fact that 1.6 seems to me a pretty high score for a rule 
which would be triggered on such a large number of ham emails. Just 
saying.


You should understand that when you start tuning scores, you can get to 
hell

very fast. unless you do your own mass-checks and tune according to them.


I'm not too sure I understand this attitude. The whole reason I started 
to tweak the scores for certain rules is that too much spam was going 
through. The false negatives have gone down considerably since I have 
altered the scores - and yes, I do keep an eye on them constantly and 
adjust depending on the number of false positive and negatives, and what 
triggers what. I also use network tests / RBL's as well and Bayes. The 
simple fact of the matter is that on plenty of spam emails, only one 
significant rule might get triggered - be it a high bayes score, one of 
the DNS RBL's or something else. If the rule doesn't have a high enough 
score, the email passes through.


Spammers change their tactics and content of their emails all the time - 
and the rule scores haven't been updated in months - because of the 
problems with the updating system (which is not a criticism - I 
understand the situation). So for people to advise sticking religiously 
to the default scores, well, frankly I don't get it.


Re: HTML_IMAGE_ONLY_* generating too many FP's

2017-12-01 Thread Sebastian Arcus


On 01/12/17 10:54, Axb wrote:

On 12/01/2017 11:17 AM, Sebastian Arcus wrote:


On 30/11/17 12:45, Matus UHLAR - fantomas wrote:

On 28.11.17 19:39, Sebastian Arcus wrote:
I'm having more and more problems with the HTML_IMAGE_ONLY_* set of 
rules recently generating false positives.


Plenty of business emails will include a logo at the bottom - and 
not everybody is a graphics expert to make their logo a tiny 
optimised gif or png - so some of these are slightly bigger than 
they should be.


However, this seems to be sufficiently wide spread. Also, many 
business emails can be just a few words reply - so the ratio of 
words to images triggers the filter in SA. Could the scores on 
HTML_IMAGE_ONLY_* set of rules be lowered a bit - or is there 
anything else to be done - aside from educating all the internet on 
optimising logos in the email signatures? :-)


those have lower scorew with BAYES and network rules enabled.
configure BAYES and enable netowrk rules...


Hi. I have BAYES enabled and DNSBL's enabled (I assume that's what you 
mean by network rules?). I still think that a score of 1.6 is quite a 
lot, considering that so many emails nowadays contain either an 
embedded logo in the signature, with just a few words (in a quick 
email reply, for example), or even images inserted, instead of 
attached to the email. Please see below an example of a SA report:


-0.2 RCVD_IN_MSPIKE_H2  RBL: Average reputation (+2)
 [212.227.126.131 listed in wl.mailspike.net]
0.4 MIME_HTML_MOSTLY   BODY: Multipart message mostly text/html MIME
1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes of 
words

2.0 BAYES_50   BODY: Bayes spam probability is 40 to 60%
  [score: 0.4808]
0.8 MPART_ALT_DIFF BODY: HTML and text parts are different
0.0 HTML_MESSAGE   BODY: HTML included in message
2.5 PYZOR_CHECK    Listed in Pyzor (http://pyzor.sf.net/)
-0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at 
http://www.dnswl.org/, no

  trust
  [212.227.126.131 listed in list.dnswl.org]


you've changed SA default scores and now complain about one which hasn't 
been touched as cause for FPs?


compare the defaults with yours...
score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2
score BAYES_50  0  0  2.0    0.8

h maybe you should rethink those changes.


Indeed, I did amend some of the default SA scores, to catch more spam 
for the type of email received at this particular site. That doesn't 
change the fact that 1.6 seems to me a pretty high score for a rule 
which would be triggered on such a large number of ham emails. Just saying.


Re: HTML_IMAGE_ONLY_* generating too many FP's

2017-12-01 Thread Sebastian Arcus


On 30/11/17 12:45, Matus UHLAR - fantomas wrote:

On 28.11.17 19:39, Sebastian Arcus wrote:
I'm having more and more problems with the HTML_IMAGE_ONLY_* set of 
rules recently generating false positives.


Plenty of business emails will include a logo at the bottom - and not 
everybody is a graphics expert to make their logo a tiny optimised gif 
or png - so some of these are slightly bigger than they should be.


However, this seems to be sufficiently wide spread. Also, many 
business emails can be just a few words reply - so the ratio of words 
to images triggers the filter in SA. Could the scores on 
HTML_IMAGE_ONLY_* set of rules be lowered a bit - or is there anything 
else to be done - aside from educating all the internet on optimising 
logos in the email signatures? :-)


those have lower scorew with BAYES and network rules enabled.
configure BAYES and enable netowrk rules...


Hi. I have BAYES enabled and DNSBL's enabled (I assume that's what you 
mean by network rules?). I still think that a score of 1.6 is quite a 
lot, considering that so many emails nowadays contain either an embedded 
logo in the signature, with just a few words (in a quick email reply, 
for example), or even images inserted, instead of attached to the email. 
Please see below an example of a SA report:


-0.2 RCVD_IN_MSPIKE_H2  RBL: Average reputation (+2)
[212.227.126.131 listed in wl.mailspike.net]
0.4 MIME_HTML_MOSTLY   BODY: Multipart message mostly text/html MIME
1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes of words
2.0 BAYES_50   BODY: Bayes spam probability is 40 to 60%
 [score: 0.4808]
0.8 MPART_ALT_DIFF BODY: HTML and text parts are different
0.0 HTML_MESSAGE   BODY: HTML included in message
2.5 PYZOR_CHECKListed in Pyzor (http://pyzor.sf.net/)
-0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no
 trust
 [212.227.126.131 listed in list.dnswl.org]


HTML_IMAGE_ONLY_* generating too many FP's

2017-11-28 Thread Sebastian Arcus
I'm having more and more problems with the HTML_IMAGE_ONLY_* set of 
rules recently generating false positives.


Plenty of business emails will include a logo at the bottom - and not 
everybody is a graphics expert to make their logo a tiny optimised gif 
or png - so some of these are slightly bigger than they should be.


However, this seems to be sufficiently wide spread. Also, many business 
emails can be just a few words reply - so the ratio of words to images 
triggers the filter in SA. Could the scores on HTML_IMAGE_ONLY_* set of 
rules be lowered a bit - or is there anything else to be done - aside 
from educating all the internet on optimising logos in the email 
signatures? :-)


Re: The rise of highly targeted spam emails

2017-11-16 Thread Sebastian Arcus


On 16/11/17 12:16, Martin Gregorie wrote:

On Thu, 2017-11-16 at 09:15 +, Sebastian Arcus wrote:

On 15/11/17 18:11, Martin Gregorie wrote:

On Wed, 2017-11-15 at 14:44 +, Sebastian Arcus wrote:





I initially decided that an archive was A Good Thing to have,
simply because retrieving mail from it should be a lot faster than
searching through huge mail folders. This turned out to be true in
practice: the archive currently holds 183,000 emails and a worst
case search takes around 30 seconds to return a list of hits
(running on a 3 GHz dual Athlon system with 4GB RAM and Fedora 25
as its OS).


Thank you for the details. How do you search the archive? With grep
directly on the server?


Using SQL queries.

The two main tables in the database hold e-mail addresses and messages
respectively plus there are many-to-many links between the two that are
implemented with a third table that holds the link type ('To' or
'From') and an additional table containing subject text - this has a
one-to-many relationship with the messages.

The SA plugin just looks at the From header in the message being
checked and, if it finds that address in the database, sees if there
are any 'To' links associated with it. If there are, then the message
gets negative points. As I said, this SQL query is actually run against
a database view that combines the address and link tables. Since the
rows on these tables are small and the tables are indexed on address
and link type, the query is very fast.

If you want to know more about the archive, look here:
http://www.libelle-systems.c3487738.myzen.co.uk/mailarchive/

Ignore the licensing stuff: I initially thought I might be onto a
revenue source, but remarkably few people use mail archives. I should
remove the license management code and open source the archive but so
far haven't got round to doing that.


Thank you for the info. I haven't considered it before, but it makes 
sense to store large mail archives in SQL databases. I suppose it is one 
of the few ways to efficiently search such a large volume of data - much 
faster than searching Maildir or MBOX archives.


I guess one aspect that is less than ideal is the fact that it wouldn't 
be possible to give archive access to users through their normal mail 
software interface - such as Thunderbird for example.


Re: The rise of highly targeted spam emails

2017-11-16 Thread Sebastian Arcus


On 15/11/17 18:11, Martin Gregorie wrote:

On Wed, 2017-11-15 at 14:44 +, Sebastian Arcus wrote:




I initially decided that an archive was A Good Thing to have, simply
because retrieving mail from it should be a lot faster than searching
through huge mail folders. This turned out to be true in practice: the
archive currently holds 183,000 emails and a worst case search takes
around 30 seconds to return a list of hits (running on a 3 GHz dual
Athlon system with 4GB RAM and Fedora 25 as its OS).


Thank you for the details. How do you search the archive? With grep 
directly on the server?


Re: The rise of highly targeted spam emails

2017-11-16 Thread Sebastian Arcus

On 15/11/17 15:16, Reindl Harald wrote:



Am 15.11.2017 um 15:47 schrieb Sebastian Arcus:

On 15/11/17 09:56, Reindl Harald wrote:


Am 15.11.2017 um 09:41 schrieb Sebastian Arcus:
I can't really train the bayesian filter on these emails, as it 
would start to affect ham emails classification


this is a unproven claim!

we have here phishings in bayes which are classified with BAYES_99 
where my human eyes hardly can distinct them between origin messages 
classified with BAYES_00 - you just need to train both and bayes will 
find the differences over time


I'm not sure I understand this? In my limited knowledge of how 
bayesian filters work, I assumed that if the words are the 
same/similar between emails, they should produce similar bayes scores, 
no? Do you have any links to explanations of how this would work - as 
I am keen not to affect the wrong way the bayes databases I built over 
time


bayes also takes headers into account as well as a lot of invisible 
stuff, fact is that we block all the DHL phishings which existed the 
last years and short ago i saw some appearently new with a foreign 
envelope/from address failing SPF where a dhl.com server sent on behalf 
of the customer and that thing was even without whitelist_auth correctly 
classified with BAYES_00


and yes, i have QA scriptts iterating over all the spam and ham samples 
collected since 2014, test the current bayes classification, alerts if 
spam does not get BAYES_99 or ham not BAYES_00 and in that case 
"sa-retrain.sh smaple-path" which makes 5 copies with some modified 
headers like message-id and retrains them


Interesting - thank you for the details. Is this your person mailbox(es) 
- or a larger setup?




Re: The rise of highly targeted spam emails

2017-11-15 Thread Sebastian Arcus

On 15/11/17 09:56, Reindl Harald wrote:



Am 15.11.2017 um 09:41 schrieb Sebastian Arcus:
I can't really train the bayesian filter on these emails, as it would 
start to affect ham emails classification


this is a unproven claim!

we have here phishings in bayes which are classified with BAYES_99 where 
my human eyes hardly can distinct them between origin messages 
classified with BAYES_00 - you just need to train both and bayes will 
find the differences over time


I'm not sure I understand this? In my limited knowledge of how bayesian 
filters work, I assumed that if the words are the same/similar between 
emails, they should produce similar bayes scores, no? Do you have any 
links to explanations of how this would work - as I am keen not to 
affect the wrong way the bayes databases I built over time.


Re: The rise of highly targeted spam emails

2017-11-15 Thread Sebastian Arcus

On 15/11/17 09:55, Martin Gregorie wrote:

On Wed, 2017-11-15 at 08:41 +, Sebastian Arcus wrote:


The emails often contain links to various popular cloud platforms -
such as SharePoint, DropBox etc. Most of the emails come from clean
domains, or from large webmail providers.


I'd say there is not a lot you can do if the legit solicitors and
accountants you and your clients deal with normally use these public
dropboxes to deliver documents.  OTOH, if they don't do that, then if
the mail claims to be from a solicitor or accountant you can use the
presence those links as a spam recogniser, or go even further and treat
any link that *doesn't* point to the sender's own domain as a spam
indication.

Whether doing this is safe or not depends pretty much on what's in your
normal mail stream and on what is seen as normal practice for the
solicitors and accountants your users deal with.

I use a mail archive as another way of finding spam: anybody in the
archive who I've sent mail to gets tagged by a negative-scoring rule,
but this may not work for you and your users. However, system
performance isn't an issue. My archive is in a Postgres database and
the view it uses to recognise addresses that have received mail from my
domain is fast because the my DB schema was designed to support this
type of query.


Thank you - that is an interesting idea. Do you use a software to 
extract the emails from the Sent archives, or do you add them to the 
database on-the-fly, when the sent emails go out through your MTA? If 
you have any links or example scripts available I would be very much 
interested.


I suppose one side risk is that if the domain of one of your regular 
correspondents gets compromised, the spam coming from it will almost be 
guaranteed to arrive in the Inbox?


The rise of highly targeted spam emails

2017-11-15 Thread Sebastian Arcus
I have noticed in the last half a year or so the rise in much more 
focused email campaigns. I have some solicitor and accountant clients 
who receive these scam emails which are a notch above the rest. The 
English is good and correctly spelled. The footers look professional and 
just like the ones from other offices in the trade. The wording is very 
similar to the usual emails they receive - such as a reminders for 
payments, or an enquiry about documentation for the sale of a house. The 
emails often contain links to various popular cloud platforms - such as 
SharePoint, DropBox etc. Most of the emails come from clean domains, or 
from large webmail providers.


I can't really train the bayesian filter on these emails, as it would 
start to affect ham emails classification. I also assume that RBL's 
can't do much, as they would have to block everything from DropBox or 
SharePoint if they start blacklisting these emails.  Is there anything 
else that could be done to block this stuff? Have others seen these 
types of emails?


Re: OT - Hotmail/Outlook.com marking most of our email as Junk

2017-09-26 Thread Sebastian Arcus

On 21/09/17 11:13, Zulma Pape wrote:
It means that your ip is greylisted in their end. There are many 
solutions to fix this issue, but the easiest and cheapest one is the get 
a new ip, and refill the form and see their feedback about it. If it 
qualifies for mitigation then you'll start friendly with them, then 
they'll build a new reputation on your historic. If not, you can get a 
new ip and do the same steps until you get a friendly IP.
Thank you for the suggestions. I'm afraid we can't just keep on changing 
IP addresses, as there is other infrastructure tied to this IP address 
(vpn, external laptops etc.) - so it would involve quite a bit of 
reconfiguration. Also, I doubt that it would do much good, as we've had 
this IP address for 5 years - so it is clean. There is the possibility 
that Hotmail doesn't like our IP address because it is a 
consumer/ADSL/end-user IP - although I've removed it from the Spamhaus 
PBL database. I guess Hotmail must be using an internal database. In 
this case changing to another end-user IP wouldn't do much good.




Another solution is, since your volume is very low at the moment, it 
should be quite easy for you to ask from your list to add your Sender to 
their contact list. This will prevent your emails from going to junk 
folder, and at the same time this will increase the reputation of your IP.


I will ask a number of contacts to mark our emails as safe - who knows, 
maybe it will help. Thank you.




Re: MISSING_SUBJECT not triggered if subject contains whitespace

2017-09-21 Thread Sebastian Arcus


On 19/09/17 15:05, Kevin A. McGrail wrote:

On 9/19/2017 9:11 AM, David Jones wrote:
I have had these in place for years.  Maybe Kevin can consolidate and 
integrate this into his KAM.cf so I could remove them or we could 
eventually get them into the default SA ruleset after some testing. 


Hi David,

Thanks.  In addition to KAM.cf, I maintain a nonKAMrules.cf which I've 
added these attributing them with the idea to test.  It's where I throw 
rules in the PD from lists and things like that so I'm not claiming 
ownership but like the ideas.


Note, I lowered the score on the 1st two.  I'm pretty sure those might 
cause more FPs than intended.


https://www.pccc.com/downloads/SpamAssassin/contrib/nonKAMrules.cf


That looks like really useful stuff. Is it likely that any of these 
rules will make their way into SA - or should we include them ourselves 
in local.cf?


Re: OT - Hotmail/Outlook.com marking most of our email as Junk

2017-09-21 Thread Sebastian Arcus


On 21/09/17 10:28, Zulma Pape wrote:


Here is the link to the forms I talked about. Good luck !

https://support.microsoft.com/en-us/getsupport?oaspworkflow=start_1.0.0.0=capsub=edfsmsbl3=en-us=635622755123113400


Thank you for that - I've just managed to find that form in the maize of 
MS website about an hour ago. I've filled it out and submitted it - and 
just received an email saying that the IP address doesn't qualify for 
mitigation. I'm not sure if that means that the IP address is already 
clean at their end, or it is blacklisted or greylisted, but they don't 
want to unblock it.






On Thu, Sep 21, 2017 at 8:40 AM, Sebastian Arcus <s.ar...@open-t.co.uk 
<mailto:s.ar...@open-t.co.uk>> wrote:


On 19/09/17 10:29, Zulma Pape wrote:

There are tons of ways to get your IP a good reputation with
Hotmail.

Start setting up the SNDS, this will help you monitor your
reputation directly with Microsoft.


Hi - thank you for the suggestions. I have signed up for the SNDS
programme - which looks potentially useful. Unfortunately, SNDS does
not show mail and spam traffic stats for IP addresses sending less
than 100 mails per day - which seems to be the case of this
particular site. The IP Status page lists our IP as having normal
status - so I guess it's all good there.


You should also try filling their support forms, they will check
your IP's historic reputation and act accordingly to it, and
since your background is good, the feedback should be positive
for you.


I have seen references to this form on various historic forum posts
- but the links I followed are all dead. Has this form been removed?





On Tue, Sep 19, 2017 at 7:25 AM, Sebastian Arcus
<s.ar...@open-t.co.uk <mailto:s.ar...@open-t.co.uk>
<mailto:s.ar...@open-t.co.uk <mailto:s.ar...@open-t.co.uk>>> wrote:

     This is a bit off topic as it is not directly related to
SA, but I'm
     hoping that with the email and spam expertise on this
group, someone
     might throw in a useful idea - which would be much appreciated.

     I have this problem on one site where most emails we send to
     Hotmail/Outlook.com/Live.com email addresses end up in Junk
at the
     recipient's end. Things I have tried:

     1. I've setup SPF, DKIM, DMARC (and set it to 'reject').

     2. We used to smart relay outbound email through the hosting
     provider (1and1), but now changed to send directly from our
own IP
     address, so that we can control the reputation of the
sending IP -
     no change.

     3. I've checked our public IP and the domain name at
mxtoolbox.com <http://mxtoolbox.com>
     <http://mxtoolbox.com> - all tests pass (the public IP has been
     delisted from the Spamhaus non-MX/end-user IP database).

     4. I've setup forward and reverse DNS entries for our IP
address.

     5. I've checked with all DNS blocklists/blacklists I could
find -
     our domain or IP address is not flagged up anywhere.

     6. This is a small network which I've been managing for
years - the
     domain name has not been used to send marketing/lists email
of any
     sort - so the historic reputation should be fine.

     7. I've setup a monitor and block on port 25 outbound on
the network
     firewall - in case there is a trojan on a machine on the
network
     sending out spam and ruining the reputation of our IP -
it's never
     been triggered.

     8. I've checked the contents of outgoing emails - this is an
     accountants practice - the email content is standard, there is
     nothing there which should trigger bayesian filters.

     9. I've sent emails to other servers under my control
running SA -
     the scores come out perfect at the receiving end.

     10. The emails we send are operational and notices emails to
     customers - who need them. They call on the phone and
complain they
     haven't received them - just to discover they were sent,
but ended
     up in the junk.

     11. Emails we send to any other domains are never a problem
spam-wise.

     I can't really think of anything else to try - have I missed
     anything? Are Hotmail/Outlook.com spam filters a complete
lottery?





Re: OT - Hotmail/Outlook.com marking most of our email as Junk

2017-09-21 Thread Sebastian Arcus


On 19/09/17 17:17, Jerry Malcolm wrote:
My recommendation as a first step is to go to mail-tester.com. They will 
tell you to send an email to a temp email address, and they will analyze 
and grade your email as to 'spamy-ness'. Outlook, gmail, etc were 
flagging a lot of my emails.  After I finally fixed everything and got 
mail-tester.com to give me a perfect score, I haven't had any problem 
with getting flagged.


Hi - and thanks for the suggestion. I've tried in the past another 
similar service - and now I've tried mail-tester.com - it returned a 
score of 10/10






Jerry


On 9/19/2017 1:44 AM, G Roach wrote:
Microsoft use their own methods of detection including based on 
reputation and 'length of service' - ie, if you have only just started 
sending emails out from your own address (which you have) then they 
may well consider you suspicious. Theres not much yo can do about it. 
  More info here: https://mail.live.com/mail/troubleshooting.aspx




On 19/09/2017 07:25, Sebastian Arcus wrote:
This is a bit off topic as it is not directly related to SA, but I'm 
hoping that with the email and spam expertise on this group, someone 
might throw in a useful idea - which would be much appreciated.


I have this problem on one site where most emails we send to 
Hotmail/Outlook.com/Live.com email addresses end up in Junk at the 
recipient's end. Things I have tried:


1. I've setup SPF, DKIM, DMARC (and set it to 'reject').

2. We used to smart relay outbound email through the hosting provider 
(1and1), but now changed to send directly from our own IP address, so 
that we can control the reputation of the sending IP - no change.


3. I've checked our public IP and the domain name at mxtoolbox.com - 
all tests pass (the public IP has been delisted from the Spamhaus 
non-MX/end-user IP database).


4. I've setup forward and reverse DNS entries for our IP address.

5. I've checked with all DNS blocklists/blacklists I could find - our 
domain or IP address is not flagged up anywhere.


6. This is a small network which I've been managing for years - the 
domain name has not been used to send marketing/lists email of any 
sort - so the historic reputation should be fine.


7. I've setup a monitor and block on port 25 outbound on the network 
firewall - in case there is a trojan on a machine on the network 
sending out spam and ruining the reputation of our IP - it's never 
been triggered.


8. I've checked the contents of outgoing emails - this is an 
accountants practice - the email content is standard, there is 
nothing there which should trigger bayesian filters.


9. I've sent emails to other servers under my control running SA - 
the scores come out perfect at the receiving end.


10. The emails we send are operational and notices emails to 
customers - who need them. They call on the phone and complain they 
haven't received them - just to discover they were sent, but ended up 
in the junk.


11. Emails we send to any other domains are never a problem spam-wise.

I can't really think of anything else to try - have I missed 
anything? Are Hotmail/Outlook.com spam filters a complete lottery?






Re: OT - Hotmail/Outlook.com marking most of our email as Junk

2017-09-21 Thread Sebastian Arcus

On 19/09/17 10:29, Zulma Pape wrote:

There are tons of ways to get your IP a good reputation with Hotmail.

Start setting up the SNDS, this will help you monitor your reputation 
directly with Microsoft.


Hi - thank you for the suggestions. I have signed up for the SNDS 
programme - which looks potentially useful. Unfortunately, SNDS does not 
show mail and spam traffic stats for IP addresses sending less than 100 
mails per day - which seems to be the case of this particular site. The 
IP Status page lists our IP as having normal status - so I guess it's 
all good there.




You should also try filling their support forms, they will check your 
IP's historic reputation and act accordingly to it, and since your 
background is good, the feedback should be positive for you.


I have seen references to this form on various historic forum posts - 
but the links I followed are all dead. Has this form been removed?







On Tue, Sep 19, 2017 at 7:25 AM, Sebastian Arcus <s.ar...@open-t.co.uk 
<mailto:s.ar...@open-t.co.uk>> wrote:


This is a bit off topic as it is not directly related to SA, but I'm
hoping that with the email and spam expertise on this group, someone
might throw in a useful idea - which would be much appreciated.

I have this problem on one site where most emails we send to
Hotmail/Outlook.com/Live.com email addresses end up in Junk at the
recipient's end. Things I have tried:

1. I've setup SPF, DKIM, DMARC (and set it to 'reject').

2. We used to smart relay outbound email through the hosting
provider (1and1), but now changed to send directly from our own IP
address, so that we can control the reputation of the sending IP -
no change.

3. I've checked our public IP and the domain name at mxtoolbox.com
<http://mxtoolbox.com> - all tests pass (the public IP has been
delisted from the Spamhaus non-MX/end-user IP database).

4. I've setup forward and reverse DNS entries for our IP address.

5. I've checked with all DNS blocklists/blacklists I could find -
our domain or IP address is not flagged up anywhere.

6. This is a small network which I've been managing for years - the
domain name has not been used to send marketing/lists email of any
sort - so the historic reputation should be fine.

7. I've setup a monitor and block on port 25 outbound on the network
firewall - in case there is a trojan on a machine on the network
sending out spam and ruining the reputation of our IP - it's never
been triggered.

8. I've checked the contents of outgoing emails - this is an
accountants practice - the email content is standard, there is
nothing there which should trigger bayesian filters.

9. I've sent emails to other servers under my control running SA -
the scores come out perfect at the receiving end.

10. The emails we send are operational and notices emails to
customers - who need them. They call on the phone and complain they
haven't received them - just to discover they were sent, but ended
up in the junk.

11. Emails we send to any other domains are never a problem spam-wise.

I can't really think of anything else to try - have I missed
anything? Are Hotmail/Outlook.com spam filters a complete lottery?




MISSING_SUBJECT not triggered if subject contains whitespace

2017-09-19 Thread Sebastian Arcus
I've had a number of emails with no subject not triggering the 
MISSING_SUBJECT rule - only to discover that the spammers have added a 
white space after 'Subject:' - which appears to fool the code into 
thinking that there is an actual subject. Would it be possible to 
'smarten up' the code a bit to recognise this?


OT - Hotmail/Outlook.com marking most of our email as Junk

2017-09-19 Thread Sebastian Arcus
This is a bit off topic as it is not directly related to SA, but I'm 
hoping that with the email and spam expertise on this group, someone 
might throw in a useful idea - which would be much appreciated.


I have this problem on one site where most emails we send to 
Hotmail/Outlook.com/Live.com email addresses end up in Junk at the 
recipient's end. Things I have tried:


1. I've setup SPF, DKIM, DMARC (and set it to 'reject').

2. We used to smart relay outbound email through the hosting provider 
(1and1), but now changed to send directly from our own IP address, so 
that we can control the reputation of the sending IP - no change.


3. I've checked our public IP and the domain name at mxtoolbox.com - all 
tests pass (the public IP has been delisted from the Spamhaus 
non-MX/end-user IP database).


4. I've setup forward and reverse DNS entries for our IP address.

5. I've checked with all DNS blocklists/blacklists I could find - our 
domain or IP address is not flagged up anywhere.


6. This is a small network which I've been managing for years - the 
domain name has not been used to send marketing/lists email of any sort 
- so the historic reputation should be fine.


7. I've setup a monitor and block on port 25 outbound on the network 
firewall - in case there is a trojan on a machine on the network sending 
out spam and ruining the reputation of our IP - it's never been triggered.


8. I've checked the contents of outgoing emails - this is an accountants 
practice - the email content is standard, there is nothing there which 
should trigger bayesian filters.


9. I've sent emails to other servers under my control running SA - the 
scores come out perfect at the receiving end.


10. The emails we send are operational and notices emails to customers - 
who need them. They call on the phone and complain they haven't received 
them - just to discover they were sent, but ended up in the junk.


11. Emails we send to any other domains are never a problem spam-wise.

I can't really think of anything else to try - have I missed anything? 
Are Hotmail/Outlook.com spam filters a complete lottery?


Re: FORGED_YAHOO_RCVD still causing false positives

2017-09-15 Thread Sebastian Arcus


On 15/09/17 14:34, Kevin A. McGrail wrote:

On 9/15/2017 8:26 AM, RW wrote:

The rule was created and scored when spoofing Yahoo was very common,
but it isn't any more. I don't think it's worth keeping as it is - high
maintenance and error prone.


Agreed.  Score FORGED_YAHOO_RCVD to zero locally and will get a bug open 
to deprecate it.


Regards,

KAM


Much appreciated - thank you both!


Re: SA not receiving fixed FORGED_MUA_MOZILLA update?

2017-09-15 Thread Sebastian Arcus

On 15/09/17 12:21, Kevin A. McGrail wrote:

On 9/15/2017 6:54 AM, Sebastian Arcus wrote:
Thank you for the reply. Does that mean that no new rules have been 
pushed to SA installations in the past 5 months - or only some rules 
get pushed through?


The system has been "down" since March 15 in that everything is working 
but we are purposefully not changing the DNS entries.


We've resurrected it a few times and Dave Jones has done some work to 
get a new system running but it published incorrect score files.  He did 
some massaging and published a new rule set with the old score file a 
few months ago.  But since them we've been battling the machine randomly 
hanging.  And work to resurrect the old boxes failed.


We are looking for some little darn insidious issue not processing rule 
scores right.


Thank you for the update Kevin - and for the hard work of everyone 
involved. Let's hope the rules updates will be operational again soon.


Re: SA not receiving fixed FORGED_MUA_MOZILLA update?

2017-09-15 Thread Sebastian Arcus

On 15/09/17 11:41, Kevin A. McGrail wrote:

On 9/15/2017 6:11 AM, Sebastian Arcus wrote:
I am having problems with false positives for FORGED_MUA_MOZILLA for 
Yahoo emails. I see this has been already dealt with here and pushed 
to the 3.4 and trunk branches:


https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7411

However, even after running sa-update, the file 20_meta_tests.cf still 
doesn't have the changes on my servers. Has this bugfix not been 
applied for some reason?


20_meta_tests.cf on my machines is dated May 17 and June 24 for the 
3.00xx and 4.00xx dirs under /var/lib/spamassassin - which is after 
the date the bugfix was pushed (2017-04-22).


I run SA 3.4.1 and 4.0.0 on different machines


Hi Sebastian,

The Rule Promotion and Generation of tarballs with correct scores has 
been an ongoing issue for the SpamAssassin team.  I would suggest you 
simply copy the rule to your local.cf at this time.


Hi Kevin,

Thank you for the reply. Does that mean that no new rules have been 
pushed to SA installations in the past 5 months - or only some rules get 
pushed through?


FORGED_YAHOO_RCVD still causing false positives

2017-09-15 Thread Sebastian Arcus
I see this has come up again and again. Since FORGED_YAHOO_RCVD seems to 
work by checking the address of the Yahoo smtp server in the headers 
against a predefined list of Yahoo servers in SA, and Yahoo seems to add 
new servers all the time - which causes false positives, is there much 
point to this check?


If not, maybe the default score should be lowered at least to something 
like 0.2 or 0.3 (I think is at 1.5 at the moment).


SA not receiving fixed FORGED_MUA_MOZILLA update?

2017-09-15 Thread Sebastian Arcus
I am having problems with false positives for FORGED_MUA_MOZILLA for 
Yahoo emails. I see this has been already dealt with here and pushed to 
the 3.4 and trunk branches:


https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7411

However, even after running sa-update, the file 20_meta_tests.cf still 
doesn't have the changes on my servers. Has this bugfix not been applied 
for some reason?


20_meta_tests.cf on my machines is dated May 17 and June 24 for the 
3.00xx and 4.00xx dirs under /var/lib/spamassassin - which is after the 
date the bugfix was pushed (2017-04-22).


I run SA 3.4.1 and 4.0.0 on different machines


Re: In anyone else getting 325KB spams from cont...@cron-job.org?

2017-09-14 Thread Sebastian Arcus


On 14/09/17 19:59, Loren Wilton wrote:

Should be easy to block.  Just block the cron-job.org domain.


As someone else mentioned that address is an obvious joe-job. And 
scoring it high doesn't help that much. It worked for the first few 
weeks, then they went to contact@ to presumably get 
around that. I was surprised to see in the last few that they had gone 
back to the cron-job.org domain for the fake sender.


For some reason these are bypassing SA on my system, I suspect due to 
the size.


I had to add on my systems a while ago an 
/etc/mail/spamassassin/spamc.conf containing:


-s 200

to increase the maximum size of emails passed to SA. It seems some 
spammers have cottoned onto the fact that 256KB is still hardwired 
somewhere in SA, and started sending spam just above that threshold to 
bypass the filter.


Re: Config option to skip pyzor check on empty body emails?

2017-09-12 Thread Sebastian Arcus


On 12/09/17 12:33, RW wrote:

On Tue, 12 Sep 2017 08:41:01 +0100
Sebastian Arcus wrote:



The confusing part is that left to its devices, Pyzor creates
a .pyzor dir in the home dir of the user it is run as. But if
--homedir is specified, it dumps stuff directly there, instead of
creating a .pyzor dir.In the end I got rid of the "pyzor_options
--homedir" option in local.cf and it worked fine.


It is a bit confusing, but it's not that the .pyzor directory is use
inconsistently, it's that pyzor defines

   --homedir=HOMEDIR configuration directory

so the default homedir is $HOME/.pyzor/ not $HOME/.

If you want to use  pyzor_options you could use:

   pyzor_options  --homedir /var/spool/spamd/.pyzor


Like with everything, it all makes sense after you fully understand
what's going on :-) I just made the wrong assumptions about how the 
option would work. Like Ian says, the word "home" in the option

name makes it easy to assume that everything will be arranged as
subdirectories under it. No matter - I'm happy I've finally found a
solution to the empty bodied emails hitting PYZOR_CHECK :-)

Thanks again for all the help.


Re: Config option to skip pyzor check on empty body emails?

2017-09-12 Thread Sebastian Arcus

On 12/09/17 00:56, RW wrote:

On Tue, 12 Sep 2017 00:37:40 +0100
Sebastian Arcus wrote:


On 11/09/17 20:20, RW wrote:



This is why pyzor has the  local_whitelist command. At very least
it's a good idea to pipe an empty string through
"pyzor local_whitelist" (probably as the user running
spamassassin).


I have spotted that command in the docs - and if it worked, it would
seem like a good solution. But it doesn't seem to. I have added the
hash of the empty string to the local whitelist. If I try to re-add
the same hash, or the hash of the problem emails - I get a message
stating that it is already in the whitelist - so it would appear to
be working. But when running the email message through SA, it still
hits PYZOR_CHECK. I have found the location of Pyzor's local
whitelist - and the permissions are correct. It appears that SA
completely ignores the fact that the digest is whitelisted locally:


SA can't ignore it, if a hash is whitelisted pyzor returns a dummy
result.  e.g.:

$ echo "" | pyzor check
public.pyzor.org:24441  (200, 'OK') 0   0

compared with:

$ echo "" | pyzor --local-whitelist=/nonextistent check
public.pyzor.org:24441  (200, 'OK') 2749671 82562


Thank you for that. I finally gotten to the bottom of my problem. It was 
the Pyzor homedir. Although I have set it up in 
/etc/mail/spamassassin/local.cf, I ended up confusing myself. If I ran 
as root:


   #pyzor local_whitelist < /email.eml

it placed the whitelist in /root/.pyzor/whitelist

When I ran:

   #su - spamd -c "pyzor local_whitelist < /email.eml"

it placed it in /var/spool/spamd/.pyzor/whitelist (/var/spool/spamd is 
the homedir of the 'spamd' user on this system)


But when I ran:

   #su - spamd -c "pyzor --homedir /var/spool/spamd < /email.eml"

it placed it in /var/spool/spamd/whitelist

The confusing part is that left to its devices, Pyzor creates a .pyzor 
dir in the home dir of the user it is run as. But if --homedir is 
specified, it dumps stuff directly there, instead of creating a .pyzor 
dir. In the end I got rid of the "pyzor_options --homedir" option in 
local.cf and it worked fine. I was just tying myself in knots there :-)


Thanks again


Re: Config option to skip pyzor check on empty body emails?

2017-09-11 Thread Sebastian Arcus


On 11/09/17 20:20, RW wrote:

On Mon, 11 Sep 2017 17:39:16 +0100
Sebastian Arcus wrote:


Is there any way to tell SA to skip pyzor checks on emails with an
empty body (even if there are attachments). I've noticed for a while
now that emails which don't contain any text in their bodies seem to
automatically trigger PYZOR_CHECK (even if they have an attachment) -
although they are private emails so can't possibly match the digest
of spam emails. I can only guess that Pyzor matches the digest of
empty emails automatically.


It's because pyzor is based only on a simplified version of the body
text. This includes stripping any URIs or email addresses from the text.

It's not just emails with no body text there are also variants of
this that reduce to common phrases such as "Sent from my iPhone"


  I have clients who receive important
emails from their customers just with an attachment and a subject
line - and they all seem to go to Junk - because they trigger the
PYZOR_CHECK rule - which is causing problems. Any way to deal with
this?


This is why pyzor has the  local_whitelist command. At very least it's
a good idea to pipe an empty string through
"pyzor local_whitelist" (probably as the user running spamassassin).


I have spotted that command in the docs - and if it worked, it would 
seem like a good solution. But it doesn't seem to. I have added the hash 
of the empty string to the local whitelist. If I try to re-add the same 
hash, or the hash of the problem emails - I get a message stating that 
it is already in the whitelist - so it would appear to be working. But 
when running the email message through SA, it still hits PYZOR_CHECK. I 
have found the location of Pyzor's local whitelist - and the permissions 
are correct. It appears that SA completely ignores the fact that the 
digest is whitelisted locally:



su - spamd -c "spamassassin -D 2>&1 < /test1.eml" | grep -i pyzor
Sep 12 00:31:49.080 [23559] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::Pyzor from @INC

Sep 12 00:31:49.090 [23559] dbg: pyzor: network tests on, attempting Pyzor
Sep 12 00:31:50.679 [23559] dbg: config: fixed relative path: 
/var/lib/spamassassin/3.004001/updates_spamassassin_org/25_pyzor.cf
Sep 12 00:31:50.679 [23559] dbg: config: using 
"/var/lib/spamassassin/3.004001/updates_spamassassin_org/25_pyzor.cf" 
for included file
Sep 12 00:31:50.680 [23559] dbg: config: read file 
/var/lib/spamassassin/3.004001/updates_spamassassin_org/25_pyzor.cf
Sep 12 00:31:57.411 [23559] dbg: util: executable for pyzor was found at 
/usr/bin/pyzor

Sep 12 00:31:57.412 [23559] dbg: pyzor: pyzor is available: /usr/bin/pyzor
Sep 12 00:31:57.413 [23559] dbg: pyzor: opening pipe: /usr/bin/pyzor 
--homedir /var/spool/spamd check < /tmp/.spamassassin23559DIrl4Ktmp

Sep 12 00:31:58.154 [23559] dbg: pyzor: [23560] finished: exit 1
Sep 12 00:31:58.155 [23559] dbg: pyzor: got response: 
public.pyzor.org:24441 (200, 'OK') 2749542 82562
Sep 12 00:31:58.156 [23559] dbg: check: tagrun - tag PYZOR is now ready, 
value: Whitelisted.
Sep 12 00:31:58.157 [23559] dbg: pyzor: listed: COUNT=2749542/5 
WHITELIST=82562
Sep 12 00:31:58.159 [23559] dbg: rules: ran eval rule PYZOR_CHECK 
==> got hit (1)



*  2.5 PYZOR_CHECK Listed in Pyzor (http://pyzor.sf.net/)
  2.5 PYZOR_CHECKListed in Pyzor (http://pyzor.sf.net/)


Config option to skip pyzor check on empty body emails?

2017-09-11 Thread Sebastian Arcus
Is there any way to tell SA to skip pyzor checks on emails with an empty 
body (even if there are attachments). I've noticed for a while now that 
emails which don't contain any text in their bodies seem to 
automatically trigger PYZOR_CHECK (even if they have an attachment) - 
although they are private emails so can't possibly match the digest of 
spam emails. I can only guess that Pyzor matches the digest of empty 
emails automatically. I have clients who receive important emails from 
their customers just with an attachment and a subject line - and they 
all seem to go to Junk - because they trigger the PYZOR_CHECK rule - 
which is causing problems. Any way to deal with this?


Re: SA not performing DNSBL queries correctly

2017-05-20 Thread Sebastian Arcus


On 17/05/17 18:11, Sebastian Arcus wrote:




On 17/05/17 16:53, David Mehler wrote:

Hi,

I don't see your SA issue here, but since your running 3.41 can I get
a look at your SA configuration to compare against mine?
Thanks.
Dave.


Yes - you are correct. As I pointed out in my last email, it looks like 
there might be an issue with the package supplied by Slackware at 
slackbuilds.org - and I am chasing it up with them there. But thanks to 
the advice on this list, I've managed to narrow things down - so I am 
grateful for the hints.



Just a follow-up and clarification on this issue - after more testing, 
it seems that it was the Spamassassin version which was the problem. I 
have had to upgrade SA on 7 servers running 3.4.1 on Slackware - as the 
dns rbl's weren't working on any of them. The only server I had with SA 
3.4.0 *was* actually working correctly. After upgrading all the boxes to 
4.0.0, the dns rbl's are now working correctly. I have *not* changed any 
configuration options in SA - I left all the servers as they were in 
this respect - so it seems it was not a configuration issue.


I'm afraid I haven't been able to narrow it down further than this. The 
servers were all running various kernels, both x86 and x86_64 
architectures, and several different versions of Perl - so I would guess 
the SA version was the common factor and the likely culprit.




Re: SA not performing DNSBL queries correctly

2017-05-17 Thread Sebastian Arcus




On 17/05/17 16:53, David Mehler wrote:

Hi,

I don't see your SA issue here, but since your running 3.41 can I get
a look at your SA configuration to compare against mine?
Thanks.
Dave.


Yes - you are correct. As I pointed out in my last email, it looks like 
there might be an issue with the package supplied by Slackware at 
slackbuilds.org - and I am chasing it up with them there. But thanks to 
the advice on this list, I've managed to narrow things down - so I am 
grateful for the hints.








On 5/17/17, Sebastian Arcus <s.ar...@open-t.co.uk> wrote:

On 17/05/17 14:54, Sebastian Arcus wrote:

On 17/05/17 14:21, Kevin A. McGrail wrote:

On 5/17/2017 8:22 AM, Sebastian Arcus wrote:

I have 2 servers with SA 3.4.1 running on Slackware, with Bind in
caching/recursive mode. For months one of them has been unable to
correctly do dns blocklists (but the queries are not blocked). I have
pored over the logs, and the main difference is that, although both
of them pick up on the bad urls in the body of the message, the bad
server is unable to resolve the url to an IP address for some reason
(but dig works fine on the command line on both servers):

What version of Net::DNS on the two boxes?  Does the 3.4 branch from
SVN work?

There have been changes to Net::DNS that are my likely first guess.


Thank you for the suggestion. I have Net::DNS 1.10. I have just
recompiled SA from SVN and it is using dnsrbl's correctly. Have there
been some changes in the way SA works recently?


A small update to this - I recompiled 3.4.1 by hand - and this is
working fine as well. This would suggest that the Slackware package is
somehow the problem - unless it is all coincidental and I am somehow
chasing my own tail. I will update here if I find out more. Thank you
again for the suggestion.



Re: SA not performing DNSBL queries correctly

2017-05-17 Thread Sebastian Arcus

On 17/05/17 14:54, Sebastian Arcus wrote:

On 17/05/17 14:21, Kevin A. McGrail wrote:

On 5/17/2017 8:22 AM, Sebastian Arcus wrote:
I have 2 servers with SA 3.4.1 running on Slackware, with Bind in 
caching/recursive mode. For months one of them has been unable to 
correctly do dns blocklists (but the queries are not blocked). I have 
pored over the logs, and the main difference is that, although both 
of them pick up on the bad urls in the body of the message, the bad 
server is unable to resolve the url to an IP address for some reason 
(but dig works fine on the command line on both servers): 
What version of Net::DNS on the two boxes?  Does the 3.4 branch from 
SVN work?


There have been changes to Net::DNS that are my likely first guess.


Thank you for the suggestion. I have Net::DNS 1.10. I have just 
recompiled SA from SVN and it is using dnsrbl's correctly. Have there 
been some changes in the way SA works recently?


A small update to this - I recompiled 3.4.1 by hand - and this is 
working fine as well. This would suggest that the Slackware package is 
somehow the problem - unless it is all coincidental and I am somehow 
chasing my own tail. I will update here if I find out more. Thank you 
again for the suggestion.


Re: SA not performing DNSBL queries correctly

2017-05-17 Thread Sebastian Arcus

On 17/05/17 14:21, Kevin A. McGrail wrote:

On 5/17/2017 8:22 AM, Sebastian Arcus wrote:
I have 2 servers with SA 3.4.1 running on Slackware, with Bind in 
caching/recursive mode. For months one of them has been unable to 
correctly do dns blocklists (but the queries are not blocked). I have 
pored over the logs, and the main difference is that, although both of 
them pick up on the bad urls in the body of the message, the bad 
server is unable to resolve the url to an IP address for some reason 
(but dig works fine on the command line on both servers): 
What version of Net::DNS on the two boxes?  Does the 3.4 branch from SVN 
work?


There have been changes to Net::DNS that are my likely first guess.


Thank you for the suggestion. I have Net::DNS 1.10. I have just 
recompiled SA from SVN and it is using dnsrbl's correctly. Have there 
been some changes in the way SA works recently?


SA not performing DNSBL queries correctly

2017-05-17 Thread Sebastian Arcus
I have 2 servers with SA 3.4.1 running on Slackware, with Bind in 
caching/recursive mode. For months one of them has been unable to 
correctly do dns blocklists (but the queries are not blocked). I have 
pored over the logs, and the main difference is that, although both of 
them pick up on the bad urls in the body of the message, the bad server 
is unable to resolve the url to an IP address for some reason (but dig 
works fine on the command line on both servers):


On the good server:

dbg: uridnsbl: complete_ns_lookup NS:spamdomain.com
dbg: uridnsbl: got(1) NS for spamdomain.com: spamdomain.com. 45 IN NS 
ns3.bkdns.vn.


dbg: uridnsbl: complete_a_lookup A:spamdomain.com
dbg: uridnsbl: complete_a_lookup got(1) A for spamdomain.com: 
spamdomain.com. 45 IN A 1.2.3.4


On the broken server I only get:

dbg: uridnsbl: complete_ns_lookup NS:spamdomain.com
dbg: dns: dns reply 62167 is OK, 0 answer records
dbg: async: calling callback on key A:spamdomain.com
dbg: uridnsbl: complete_a_lookup A:spamdomain.com
dbg: dns: dns reply 36552 is OK, 0 answer records

Would anybody know why the broken server is unable to resolve domains to 
IP's in SA (but works ok through dig)? There are no error messages 
anywhere that I can find and spamassassin -D --lint is not complaining 
of anything.




Re: Dns Blocklists always returning 0 records

2017-03-27 Thread Sebastian Arcus


On 27/03/17 11:10, Kevin A. McGrail wrote:

On 3/27/2017 5:28 AM, Sebastian Arcus wrote:


And yet, no dns block lists make it to the final scores


I have only filed the thread briefly but check your versions of Net::DNS.


The good server has Net::DNS 0.83 - so way out of date. The problem 
server has Net::DNS 1.06 - so not quite latest, but still much newer 
than the sever where SA works fine.


I've just upgraded Net::DNS on the problem server to 1.09 - I'm afraid 
SA is still reporting zero hits from dns blocklists:



Mar 27 21:24:05.900 [31500] dbg: async: calling callback on key 
dns:A:109.150.73.212.zen.spamhaus.org
Mar 27 21:24:05.930 [31500] dbg: dns: dns reply 17643 is OK, 0 answer 
records




Bug dig still gets a hit on the same server:

#dig 109.150.73.212.zen.spamhaus.org

; <<>> DiG 9.10.4-P1 <<>> 109.150.73.212.zen.spamhaus.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55153
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;109.150.73.212.zen.spamhaus.org. INA

;; ANSWER SECTION:
109.150.73.212.zen.spamhaus.org. 808 IN A   127.0.0.4

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Mar 27 21:48:08 BST 2017
;; MSG SIZE  rcvd: 76




Re: Dns Blocklists always returning 0 records

2017-03-27 Thread Sebastian Arcus

On 26/03/17 14:12, David Jones wrote:

From: Sebastian Arcus <s.ar...@open-t.co.uk>
Sent: Sunday, March 26, 2017 4:23 AM
To: users@spamassassin.apache.org
Subject: Dns Blocklists always returning 0 records



I have a server with SA where I just can't seem to get DNS based block
lists / RBL working. I have tested the same email message against
another server, and it gets hits from DNS block lists. But on this
particular server they just don't seem to work - but the dns queries are
not blocked either.



1. Both servers are on SA 3.4.1
2. I've ran sa-update on both of them.
3. Both servers have Perl Net::DNS installed
4. Both servers have Bind configured locally and running fine as a
caching name server.
5. On the problematic server, the dns based checks are being run, not
being blocked, but always returning 0 records.



What else can I check in the SA config or more widely on the server?
What could possible cause this? Any suggestions would be much appreciated.



I attach below a snippet of spamassassin -D output from the problem
server - but I'm happy to enclose here, or upload the whole thing
somewhere else if it helps:



#spamassassin -D 2>&1 < /test_email.eml | grep -i -A 3 "answer records"







Mar 26 10:12:39.060 [7061] dbg: async: calling callback on key
dns:A:109.150.73.212.bb.barracudacentral.org
Mar 26 10:12:39.062 [7061] dbg: dns: dns reply 61164 is OK, 0 answer records
Mar 26 10:12:39.062 [7061] dbg: async: calling callback on key
dns:A:109.150.73.212.zen.spamhaus.org
Mar 26 10:12:39.064 [7061] dbg: dns: dns reply 20939 is OK, 0 answer records
Mar 26 10:12:39.064 [7061] dbg: async: calling callback on key
dns:TXT:109.150.73.212.sa-accredit.habeas.com
Mar 26 10:12:39.066 [7061] dbg: dns: dns reply 56465 is OK, 0 answer records
Mar 26 10:12:39.066 [7061] dbg: async: calling callback on key
dns:A:109.150.73.212.iadb.isipp.com
Mar 26 10:12:39.069 [7061] dbg: dns: dns reply 19262 is OK, 0 answer records






I get this response on my working SA servers for the IP address above:

;; ANSWER SECTION:
109.150.73.212.zen.spamhaus.org. 300 IN A   127.0.0.4

What does the output of this commnd say on your SA server?

dig test.dbl.spamhaus.org

Compare the output on both servers.  I suspect this will point you in
the right direction.  For example, "SERVER:" should point to 127.0.0.1.


On the problem server, if I run:

#dig 109.150.73.212.zen.spamhaus.org

I get:

;; ANSWER SECTION:
109.150.73.212.zen.spamhaus.org. 337 IN A   127.0.0.4

And I can also see it is using 127.0.0.1 as the server.

I can even see in the SA debug output (on the problem server):

Mar 27 10:25:12.173 [23914] dbg: dns: hit 
 127.0.0.4


And yet, no dns block lists make it to the final scores.


Dns Blocklists always returning 0 records

2017-03-26 Thread Sebastian Arcus
I have a server with SA where I just can't seem to get DNS based block 
lists / RBL working. I have tested the same email message against 
another server, and it gets hits from DNS block lists. But on this 
particular server they just don't seem to work - but the dns queries are 
not blocked either.


1. Both servers are on SA 3.4.1
2. I've ran sa-update on both of them.
3. Both servers have Perl Net::DNS installed
4. Both servers have Bind configured locally and running fine as a 
caching name server.
5. On the problematic server, the dns based checks are being run, not 
being blocked, but always returning 0 records.


What else can I check in the SA config or more widely on the server? 
What could possible cause this? Any suggestions would be much appreciated.


I attach below a snippet of spamassassin -D output from the problem 
server - but I'm happy to enclose here, or upload the whole thing 
somewhere else if it helps:


#spamassassin -D 2>&1 < /test_email.eml | grep -i -A 3 "answer records"



Mar 26 10:12:39.060 [7061] dbg: async: calling callback on key 
dns:A:109.150.73.212.bb.barracudacentral.org

Mar 26 10:12:39.062 [7061] dbg: dns: dns reply 61164 is OK, 0 answer records
Mar 26 10:12:39.062 [7061] dbg: async: calling callback on key 
dns:A:109.150.73.212.zen.spamhaus.org

Mar 26 10:12:39.064 [7061] dbg: dns: dns reply 20939 is OK, 0 answer records
Mar 26 10:12:39.064 [7061] dbg: async: calling callback on key 
dns:TXT:109.150.73.212.sa-accredit.habeas.com

Mar 26 10:12:39.066 [7061] dbg: dns: dns reply 56465 is OK, 0 answer records
Mar 26 10:12:39.066 [7061] dbg: async: calling callback on key 
dns:A:109.150.73.212.iadb.isipp.com

Mar 26 10:12:39.069 [7061] dbg: dns: dns reply 19262 is OK, 0 answer records





Re: Different bayes results from command line and through MTA

2016-12-23 Thread Sebastian Arcus

On 23/12/16 17:02, Andrzej A. Filip wrote:

Sebastian Arcus <s.ar...@open-t.co.uk> wrote:

On 23/12/16 10:12, Sebastian Arcus wrote:

I know this hot potato has been discussed before - but I'm afraid it's
back to haunt me and I can't fathom it out. I'm getting again different
bayes results if I test a message on the command line, compared to it
going through exim -> spamassassin.




OK - after staring for a good while at debug logs, I think I finally
found the culprit. The saved .eml file which I pass through spamc
contains the report embedded by spamassassin in the headers (that's
how my Exim is configured). This report includes the first few lines
of the actual email body. This in turn has the effect of effectively
doubling the Bayes score, as spamassassin tokenizes these sample lines
on top of the actual email body. As the email body for these
particular spam emails is small - the sample in the header is almost
equal in size with the text in the email body itself.

As soon as I manually delete the SA headers and report in the .eml
file, and pass the message again through spamc, I get identical Bayes
scores to the ones when the message passes initially through Exim ->
SA.

However, this raises some interesting questions. It would appear that
SA is incapable of recognising it's own reports in the header of the
emails, and tokenizes them as well and adds them to the Bayes
report. Is that right?

Also, does it mean that, as SA tokenizes all the info in the headers,
my own email address, as the recipient of the email, will also be
added to the database of spam tokens - when I ask SA to learn a
message as spam?

I seem to have ended up with more questions than I started :-)


Have you considered using bayes_ignore_header in spamassassin
configuration file?

https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html



Many thanks for the suggestion - I didn't know about bayes_ignore_header

One quick question - does anybody know if bayes_ignore_header takes 
effect both when classifying email *and* when learning spam/ham?


Re: Different bayes results from command line and through MTA

2016-12-23 Thread Sebastian Arcus

On 23/12/16 17:18, Paul Stead wrote:



On 23/12/2016, 13:35, "Sebastian Arcus" <s.ar...@open-t.co.uk> wrote:

As soon as I manually delete the SA headers and report in the .eml file,
and pass the message again through spamc, I get identical Bayes scores
to the ones when the message passes initially through Exim -> SA.

http://svn.apache.org/repos/asf/spamassassin/trunk/rulesrc/sandbox/axb/23_bayes_ignore_header.cf
 this is a sandbox ruleset but it answers your question here and also prevents 
other potentially bad signals.

However, this raises some interesting questions. It would appear that SA
is incapable of recognising it's own reports in the header of the
emails, and tokenizes them as well and adds them to the Bayes report. Is
that right?

Spamassassin ignores certain headers - 
http://svn.apache.org/repos/asf/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Bayes.pm
 - note here that within $IGNORED_HDRS we have -

---8<---
 |X-Spam(?:-(?:Status|Level|Flag|Report|Hits|Score|Checker-Version))?
---8<---

Really SA should be ignoring the headers it puts there – do the headers match 
anything in that list?


No - I use my own customer headers. But I will configure SA to ignore 
them - as per suggestion from Andrzej




Also, does it mean that, as SA tokenizes all the info in the headers, my
own email address, as the recipient of the email, will also be added to
the database of spam tokens - when I ask SA to learn a message as spam?

As above, headers like “X-Envelope-To” and “X-Delivered-To” etc etc are 
ignored, however the To: header is not as this can be a good indicator – for 
example, if a ‘spoofed’ To header isn’t matching the actual recipient of the 
email within your system… *mumble* numbers and things


Thank you very much for the explanation


Re: Different bayes results from command line and through MTA

2016-12-23 Thread Sebastian Arcus

On 23/12/16 10:12, Sebastian Arcus wrote:

I know this hot potato has been discussed before - but I'm afraid it's
back to haunt me and I can't fathom it out. I'm getting again different
bayes results if I test a message on the command line, compared to it
going through exim -> spamassassin.

>
> 

OK - after staring for a good while at debug logs, I think I finally 
found the culprit. The saved .eml file which I pass through spamc 
contains the report embedded by spamassassin in the headers (that's how 
my Exim is configured). This report includes the first few lines of the 
actual email body. This in turn has the effect of effectively doubling 
the Bayes score, as spamassassin tokenizes these sample lines on top of 
the actual email body. As the email body for these particular spam 
emails is small - the sample in the header is almost equal in size with 
the text in the email body itself.


As soon as I manually delete the SA headers and report in the .eml file, 
and pass the message again through spamc, I get identical Bayes scores 
to the ones when the message passes initially through Exim -> SA.


However, this raises some interesting questions. It would appear that SA 
is incapable of recognising it's own reports in the header of the 
emails, and tokenizes them as well and adds them to the Bayes report. Is 
that right?


Also, does it mean that, as SA tokenizes all the info in the headers, my 
own email address, as the recipient of the email, will also be added to 
the database of spam tokens - when I ask SA to learn a message as spam?


I seem to have ended up with more questions than I started :-)



Different bayes results from command line and through MTA

2016-12-23 Thread Sebastian Arcus
I know this hot potato has been discussed before - but I'm afraid it's 
back to haunt me and I can't fathom it out. I'm getting again different 
bayes results if I test a message on the command line, compared to it 
going through exim -> spamassassin.


The header of the message received in the Inbox contains the following 
report:


 Content analysis details:   (10.5 points, 4.2 required)

  pts rule name  description
  -- 
--

  0.4 STOX_REPLY_TYPENo description available.
  3.0 DATE_IN_FUTURE_03_06   Date: is 3 to 6 hours after Received: date
  3.2 BAYES_50   BODY: Bayes spam probability is 40 to 60%
 [score: 0.5000]
  0.0 MIME_QP_LONG_LINE  RAW: Quoted-printable line longer than 76 
chars
  0.0 UNPARSEABLE_RELAY  Informational: message has unparseable 
relay lines

  1.8 STOX_REPLY_TYPE_WITHOUT_QUOTES No description available.
  2.1 FREEMAIL_FORGED_REPLYTO Freemail in Reply-To, but not From

While if I test it on the command line (spamc -R < /test_message.eml), I 
get really different results:


ontent analysis details:   (20.2 points, 4.2 required)

 pts rule name  description
 -- 
--

 4.9 BAYES_99   BODY: Bayes spam probability is 99 to 100%
[score: 1.]
 0.4 STOX_REPLY_TYPENo description available.
 3.0 DATE_IN_FUTURE_03_06   Date: is 3 to 6 hours after Received: date
 8.0 BAYES_999  BODY: Bayes spam probability is 99.9 to 100%
[score: 1.]
 0.0 MIME_QP_LONG_LINE  RAW: Quoted-printable line longer than 76 chars
 0.0 UNPARSEABLE_RELAY  Informational: message has unparseable 
relay lines

 1.8 STOX_REPLY_TYPE_WITHOUT_QUOTES No description available.
 2.1 FREEMAIL_FORGED_REPLYTO Freemail in Reply-To, but not From


On the command line it is hitting BAYES_99 and BAYES_999 - while through 
Exim it doesn't. I know the first thing is to look for is file 
permissions for the bayes databases. I've checked them. Also, I have 
spamassassin listening on a TCP port - and both Exim and spamc connect 
to it this way (I believe) - so permissions shouldn't make a difference 
between the two methods of testing the email - is that correct?


Also, I use a site-wide bayes database - so only one set of files.

I'm running spamd under the "spamd" user - which owns the bayes database 
files and directory:


/usr/bin/spamd -d -l --pidfile=/var/run/spamd/spamd.pid --username=spamd

What could possibly account for the large discrepancy in bayes results?


Re: Spamassassin uses bayes, but spamd doesn't

2016-06-17 Thread Sebastian Arcus

On 17/06/16 14:49, RW wrote:

On Fri, 17 Jun 2016 14:07:33 +0100
Sebastian Arcus wrote:





Site-wide bayes files are owned
by spamd. Regarding the daemon, it is started with
--socketowner=spamd and socketpath=spamd. Is this enough, or
should it be actually started with "su" as "spamd" user?


If you start it as root with the -u spamd (or --username) it will drop
privileges to spamd. Starting it as root allows it to bind to a low
port should you need that.



"socketpath=spamd" sounds idiotic, hpwever for a site-wide setup
there is no point in start it as root instead directly as the
correct user, see below, can#t say anything about "su" in service
files since i don't touch sysvinit for 5 years now


That is probably so - I've taken another look at my startup scripts,
and I have to say it feels like I've been tying myself in knots with
--socketowner and --socketgroup and --username. I was thinking that
for my setup using:

--username=spamd --socketownder=exim --socketgroup=exim

might be the most suitable. Is it better to run it instead with

--socketmode=666


You should use -u,--username unless you need to access per user data
from unix home directories. You need this even if you start directly as
spamd.


and not bother with setting owner and group for the socket?


Is there any particular reason for even using a socket file?



A good point - if I leave them out, spamd will talk on the default IP 
port, and Exim can do that as well. Thank you for suggesting!


Re: Spamassassin uses bayes, but spamd doesn't

2016-06-17 Thread Sebastian Arcus


On 16/06/16 18:46, Sebastian Arcus wrote:

I have a particular server running spamd which uses bayes every time I
test it by hand, but apparently never when it goes through exim/spamd.

I run everything (both the spamd daemon and the manual tests) as user
spamd. I checked the permissions on the bayes database. I use a global
bayes database in /var/spool/spamd/bayes/. I ran "spamassassin -D
--lint" - and I get no failures - both as root and as the user spamd.

In spite of all of the above, it looks pretty clear that bayes is only
used when I run an email manually through spamassassin, but not when it
goes from exim through spamd.

Here is the report when ran from the command line:

Content analysis details:   (5.4 points, 5.0 required)

 pts rule name  description
 --
--
 2.0 BAYES_50   BODY: Bayes spam probability is 40 to 60%
[score: 0.5000]
 0.0 HTML_IMAGE_RATIO_06BODY: HTML has a low ratio of text to image
area
 0.0 HTML_MESSAGE   BODY: HTML included in message
 0.0 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar or identical to
background
 0.8 MPART_ALT_DIFF BODY: HTML and text parts are different
 0.0 T_KAM_HTML_FONT_INVALID BODY: Test for Invalidly Named or Formatted
Colors in HTML
 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 0.1 DKIM_SIGNEDMessage has a DKIM or DK signature, not
necessarily valid
 0.2 RDNS_NONE  Delivered to internal network by a host with
no rDNS
 0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid
 0.0 UNPARSEABLE_RELAY  Informational: message has unparseable relay
lines
 0.0 LOTS_OF_MONEY  Huge... sums of money
 1.5 SUBJ_ILLEGAL_CHARS Subject: has too many raw illegal characters
 0.0 MIME_HTML_ONLY_MULTI   Multipart message only has text/html MIME parts
 0.0 SUBJECT_NEEDS_ENCODING Subject is encoded but does not specify the
encoding


And here is the report included in the same email message when it comes
through exim:

 Content analysis details:   (1.9 points, 5.0 required)

  pts rule name  description
  --
--
  0.7 MPART_ALT_DIFF BODY: HTML and text parts are different
  0.0 HTML_IMAGE_RATIO_06BODY: HTML has a low ratio of text to image
area
  0.0 HTML_MESSAGE   BODY: HTML included in message
  0.0 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar or identical to
 background
  0.0 T_KAM_HTML_FONT_INVALID BODY: Test for Invalidly Named or Formatted
 Colors in HTML
  1.1 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 -0.1 DKIM_VALID Message has at least one valid DKIM or DK
signature
 -0.1 DKIM_VALID_AU  Message has a valid DKIM or DK signature
from author's
 domain
  0.1 DKIM_SIGNEDMessage has a DKIM or DK signature, not
necessarily valid
  0.0 LOTS_OF_MONEY  Huge... sums of money
  0.2 RDNS_NONE  Delivered to internal network by a host
with no rDNS
  0.0 UNPARSEABLE_RELAY  Informational: message has unparseable
relay lines
  0.0 MIME_HTML_ONLY_MULTI   Multipart message only has text/html MIME
parts


Bayes is clearly not being used when it goes through spamd. Does anybody
know what could be causing this?


OK - thank you to everybody who helped with hints and info. Bayes is 
finally working now. What I initially had in place is:


1. Site-wide bayes db in /var/spool/spamd/bayes/ and owned by spamd.spamd
2. Spamd socket owned by spamd.spamd - which turns out that didn't make 
much sense
3. Spamd ran as root - for some reason I got confused and thought 
setting the owner/group for the socket meant spamd was run as non-root user.


What I have now:
1. Spamd socket owned by exim.exim (as it is the only piece of software 
which needs to talk to spamd) - and mode set to 0660.

2. Spamd runs as "spamd" user.
3. Bayes db still in the same place as above and with the same ownership 
- but I've set them as 0660


In conclusion it would appear that running the spamd as root was the 
cause of the problem - although root should have been able to access the 
bayes database anyway. I'm a little lost on that point I'm afraid. But I 
think it's been a good opportunity to straighten the setup both on the 
server and in my head :-) Thank you again.


Re: Spamassassin uses bayes, but spamd doesn't

2016-06-17 Thread Sebastian Arcus


On 17/06/16 04:46, Bill Cole wrote:

On 16 Jun 2016, at 13:46, Sebastian Arcus wrote:


I have a particular server running spamd


Which must run on a particular platform. Since SpamAssassin and Exim can
run on a decade's worth of versions of at least 9 different OSs and one
of those (Linux) has about a half-dozen distinctly different families of
distributions that have become quite divergent, it would help to
identify your OS and version (or if Linux, which distro & its version)
when seeking help from people who don't keep track of what sorts of
systems you run. This helps constrain the scope of sane guessing...

(However, the ability to run arbitrary programs as 'root' implies a
POSIX-y platform with a true-root security model, so I'll assume this
isn't some Windows-Frankenstein abomination or El Capitan)


which uses bayes every time I test it by hand, but apparently never
when it goes through exim/spamd.

I run everything (both the spamd daemon and the manual tests) as user
spamd. I checked the permissions on the bayes database. I use a global
bayes database in /var/spool/spamd/bayes/.


Provide `ls -la /var/spool/spamd/bayes/`, please. Or if the problem that
reveals is obvious, just fix it and you're welcome. :)


I ran "spamassassin -D --lint" - and I get no failures - both as root
and as the user spamd.


And when you run spamassassin as root, you risk having root steal the
Bayes and AWL DBs. Presumably this is why some misguided articles online
documenting SA setup for system-wide use recommend deeply wrong things
like 'chmod -R 777' on your database directory. Don't do that. Ever. On
any directory. Use an ad hoc group, BSD directory setgid semantics or
fileflags, ACLs, a script that runs from cron every minute, or whatever
else can work on your platform to assure that spamd can always read and
write to everything in that directory, but DO NOT 777 it.


In spite of all of the above, it looks pretty clear that bayes is only
used when I run an email manually through spamassassin, but not when
it goes from exim through spamd.


Is spamd configured to do any logging? By default on POSIX platforms it
logs under the mail facility and if it can't open the BayesDB it will
log that fact. If it does so but there's no ownership/permission problem
it could also be due to SELinux, running spamd in a chroot jail (bad
idea,) or maybe AppArmor (about which I know nothing other than that
it's an alternative to SELinux.) These are solvable problems.


Thank you for all the suggestions above - and you are right, I should 
have been more specific about my setup. I'll report back to the list 
with progress or when it is solved.


Re: Spamassassin uses bayes, but spamd doesn't

2016-06-17 Thread Sebastian Arcus

On 17/06/16 13:42, Reindl Harald wrote:



Am 17.06.2016 um 14:29 schrieb Sebastian Arcus:

On 17/06/16 00:03, Reindl Harald wrote:



Am 16.06.2016 um 19:46 schrieb Sebastian Arcus:

I have a particular server running spamd which uses bayes every time I
test it by hand, but apparently never when it goes through exim/spamd


then you need to run it as the correct user or train it as the correct


Thank you for the suggestion. There is no training involved, and
auto-learn is switched off in local.cf


how do you imagine bayes working then?


These are bayes databases from another server - any training happens 
there - so training and auto-learning is disabled on this particular server.





Site-wide bayes files are owned
by spamd. Regarding the daemon, it is started with --socketowner=spamd
and socketpath=spamd. Is this enough, or should it be actually started
with "su" as "spamd" user?


"socketpath=spamd" sounds idiotic, hpwever for a site-wide setup there
is no point in start it as root instead directly as the correct user,
see below, can#t say anything about "su" in service files since i don't
touch sysvinit for 5 years now


That is probably so - I've taken another look at my startup scripts, and 
I have to say it feels like I've been tying myself in knots with 
--socketowner and --socketgroup and --username. I was thinking that for 
my setup using:


--username=spamd --socketownder=exim --socketgroup=exim

might be the most suitable. Is it better to run it instead with

--socketmode=666

and not bother with setting owner and group for the socket?


Re: Spamassassin uses bayes, but spamd doesn't

2016-06-17 Thread Sebastian Arcus

On 17/06/16 00:03, Reindl Harald wrote:



Am 16.06.2016 um 19:46 schrieb Sebastian Arcus:

I have a particular server running spamd which uses bayes every time I
test it by hand, but apparently never when it goes through exim/spamd


then you need to run it as the correct user or train it as the correct user



Thank you for the suggestion. There is no training involved, and 
auto-learn is switched off in local.cf. Site-wide bayes files are owned 
by spamd. Regarding the daemon, it is started with --socketowner=spamd 
and socketpath=spamd. Is this enough, or should it be actually started 
with "su" as "spamd" user?


Re: Spamassassin uses bayes, but spamd doesn't

2016-06-17 Thread Sebastian Arcus


On 17/06/16 03:46, Yu Qian wrote:

you can use spamd -D to check the log for exactly what bayes db path
your spamd was using.


Thank Yu. Based on the output below, it appears to find and use the 
sitewide bayes files ok:


# spamd -D 2>&1 | grep -i bayes
Jun 17 13:32:51.719 [4380] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::Bayes from @INC
Jun 17 13:32:52.058 [4380] dbg: config: fixed relative path: 
/var/lib/spamassassin/3.004001/updates_spamassassin_org/23_bayes.cf
Jun 17 13:32:52.058 [4380] dbg: config: using 
"/var/lib/spamassassin/3.004001/updates_spamassassin_org/23_bayes.cf" 
for included file
Jun 17 13:32:52.058 [4380] dbg: config: read file 
/var/lib/spamassassin/3.004001/updates_spamassassin_org/23_bayes.cf
Jun 17 13:32:53.370 [4380] dbg: plugin: 
Mail::SpamAssassin::Plugin::Bayes=HASH(0xa936c48) implements 
'learner_new', priority 0
Jun 17 13:32:53.371 [4380] dbg: bayes: learner_new 
self=Mail::SpamAssassin::Plugin::Bayes=HASH(0xa936c48), 
bayes_store_module=Mail::SpamAssassin::BayesStore::DBM
Jun 17 13:32:53.390 [4380] dbg: bayes: learner_new: got 
store=Mail::SpamAssassin::BayesStore::DBM=HASH(0xab6a6a0)
Jun 17 13:32:53.391 [4380] dbg: plugin: 
Mail::SpamAssassin::Plugin::Bayes=HASH(0xa936c48) implements 
'learner_is_scan_available', priority 0
Jun 17 13:32:53.391 [4380] dbg: bayes: tie-ing to DB file R/O 
/var/spool/spamd/bayes/bayes_toks
Jun 17 13:32:53.392 [4380] dbg: bayes: tie-ing to DB file R/O 
/var/spool/spamd/bayes/bayes_seen

Jun 17 13:32:53.393 [4380] dbg: bayes: found bayes db version 3
Jun 17 13:32:53.394 [4380] dbg: bayes: DB journal sync: last sync: 
1466097119
Jun 17 13:32:55.405 [4380] dbg: plugin: 
Mail::SpamAssassin::Plugin::Bayes=HASH(0xa936c48) implements 
'learner_close', priority 0

Jun 17 13:32:55.405 [4380] dbg: bayes: untie-ing
Jun 17 13:32:55.487 [4380] dbg: plugin: 
Mail::SpamAssassin::Plugin::Bayes=HASH(0xa936c48) implements 
'prefork_init', priority 0
Jun 17 13:32:55.492 [4385] dbg: plugin: 
Mail::SpamAssassin::Plugin::Bayes=HASH(0xa936c48) implements 
'spamd_child_init', priority 0
Jun 17 13:32:55.497 [4386] dbg: plugin: 
Mail::SpamAssassin::Plugin::Bayes=HASH(0xa936c48) implements 
'spamd_child_init', priority 0





Spamassassin uses bayes, but spamd doesn't

2016-06-16 Thread Sebastian Arcus
I have a particular server running spamd which uses bayes every time I 
test it by hand, but apparently never when it goes through exim/spamd.


I run everything (both the spamd daemon and the manual tests) as user 
spamd. I checked the permissions on the bayes database. I use a global 
bayes database in /var/spool/spamd/bayes/. I ran "spamassassin -D 
--lint" - and I get no failures - both as root and as the user spamd.


In spite of all of the above, it looks pretty clear that bayes is only 
used when I run an email manually through spamassassin, but not when it 
goes from exim through spamd.


Here is the report when ran from the command line:

Content analysis details:   (5.4 points, 5.0 required)

 pts rule name  description
 -- 
--

 2.0 BAYES_50   BODY: Bayes spam probability is 40 to 60%
[score: 0.5000]
 0.0 HTML_IMAGE_RATIO_06BODY: HTML has a low ratio of text to image 
area

 0.0 HTML_MESSAGE   BODY: HTML included in message
 0.0 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar or identical to
background
 0.8 MPART_ALT_DIFF BODY: HTML and text parts are different
 0.0 T_KAM_HTML_FONT_INVALID BODY: Test for Invalidly Named or Formatted
Colors in HTML
 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 0.1 DKIM_SIGNEDMessage has a DKIM or DK signature, not 
necessarily valid
 0.2 RDNS_NONE  Delivered to internal network by a host 
with no rDNS

 0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid
 0.0 UNPARSEABLE_RELAY  Informational: message has unparseable 
relay lines

 0.0 LOTS_OF_MONEY  Huge... sums of money
 1.5 SUBJ_ILLEGAL_CHARS Subject: has too many raw illegal characters
 0.0 MIME_HTML_ONLY_MULTI   Multipart message only has text/html MIME parts
 0.0 SUBJECT_NEEDS_ENCODING Subject is encoded but does not specify the
encoding


And here is the report included in the same email message when it comes 
through exim:


 Content analysis details:   (1.9 points, 5.0 required)

  pts rule name  description
  -- 
--

  0.7 MPART_ALT_DIFF BODY: HTML and text parts are different
  0.0 HTML_IMAGE_RATIO_06BODY: HTML has a low ratio of text to 
image area

  0.0 HTML_MESSAGE   BODY: HTML included in message
  0.0 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar or identical to
 background
  0.0 T_KAM_HTML_FONT_INVALID BODY: Test for Invalidly Named or Formatted
 Colors in HTML
  1.1 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 -0.1 DKIM_VALID Message has at least one valid DKIM or DK 
signature
 -0.1 DKIM_VALID_AU  Message has a valid DKIM or DK signature 
from author's

 domain
  0.1 DKIM_SIGNEDMessage has a DKIM or DK signature, not 
necessarily valid

  0.0 LOTS_OF_MONEY  Huge... sums of money
  0.2 RDNS_NONE  Delivered to internal network by a host 
with no rDNS
  0.0 UNPARSEABLE_RELAY  Informational: message has unparseable 
relay lines
  0.0 MIME_HTML_ONLY_MULTI   Multipart message only has text/html MIME 
parts



Bayes is clearly not being used when it goes through spamd. Does anybody 
know what could be causing this?


[Solved] Re: Error when trying to re-use Bayes database from one server to another

2016-02-14 Thread Sebastian Arcus

On 13/02/16 18:58, Bill Cole wrote:

On 13 Feb 2016, at 3:49, Sebastian Arcus wrote:


Thank you. The donor machine has db42, db44 and db44 packages installed,


Based on the question below, I'll assume the second db44 above was a 
typo for db48, i.e. a Berkeley DB v4.8.x package.


Tangentially: that's a risky mess. It's a common problem but you 
should try to fix it to leave just one version, which probably means 
rebuilding a number of pieces of software. Using db48 for everything 
isn't a bad choice, despite the current version being 6.something, 
because there are still perfectly good pieces of software that use 
db4x but nothing later. In any case, you have a potentially fragile 
system there which may have different programs using diverse Berkeley 
DB versions which may be broken by otherwise routine updates. If you 
choose to leave a working system alone rather than proactively clean 
it up, be sure to


while the recipient machine only db42 and db44. Would it be enough to 
install db48 on the recipient machine, or are there also any 
glue/library Perl modules involved which SA uses for db access and 
would need to be updated as well?


Any answer to that has so many conditional branches that I'm unwilling 
to attempt a definitive one. You definitely need to install db48 on 
the recipient machine if you want it to be able to read hash files 
created elsewhere by db48. Depending on what other software is using 
db42 dn db44 there, installing db48 and doing nothing else MIGHT break 
something. Depending on how Perl was built and/or installed on that 
machine and how the various db* packages are installed it MIGHT be 
necessary to rebuild your core Perl package and/or non-core packages 
which may include BerkeleyDB or (probably not) DB_File and maybe (but 
most likely not) SpamAssassin itself. Figuring out what exactly 
depends on which package on a specific system (which you've not 
described in any detail) is an opportunity to exercise your core 
system administration skills :).


Thank you everybody who pitched in with suggestions. Just to confirm 
that in the end I decided not to mess too much with a working system and 
didn't upgrade to db48 on the older system. I went down the route of 
backing up and restoring the bayes database using sa-learn - which 
worked perfectly fine.


There is still the question of the initial sa-learn error message which 
started all this. In my opinion it looks like a bug - as it says 
"missing file" - which is clearly not the case. Something more helpful 
such as "can't decode file, possible wrong format" - or anything else 
along those lines would be more relevant and helpful. Should I log this 
as a bug somewhere?




Re: Error when trying to re-use Bayes database from one server to another

2016-02-13 Thread Sebastian Arcus

On 13/02/16 18:58, Bill Cole wrote:

On 13 Feb 2016, at 3:49, Sebastian Arcus wrote:


Thank you. The donor machine has db42, db44 and db44 packages installed,


Based on the question below, I'll assume the second db44 above was a 
typo for db48, i.e. a Berkeley DB v4.8.x package.


Yes - sorry, you are right



Tangentially: that's a risky mess. It's a common problem but you 
should try to fix it to leave just one version, which probably means 
rebuilding a number of pieces of software.


Slackware current comes with all three versions - that's a default 
install and I checked the package list at Slackware.com. I'm afraid I'm 
not sure why, but I assume there is some logic to it - as the package 
choice in Slackware always seems to have some reasoning behind it. I 
also don't know why Slackware current doesn't include version 6.x (or 
even 4.6) - maybe something to do with the current politics of Oracle - 
or some other technical reason.


Using db48 for everything isn't a bad choice, despite the current 
version being 6.something, because there are still perfectly good 
pieces of software that use db4x but nothing later. In any case, you 
have a potentially fragile system there which may have different 
programs using diverse Berkeley DB versions which may be broken by 
otherwise routine updates. If you choose to leave a working system 
alone rather than proactively clean it up, be sure to


while the recipient machine only db42 and db44. Would it be enough to 
install db48 on the recipient machine, or are there also any 
glue/library Perl modules involved which SA uses for db access and 
would need to be updated as well?


Any answer to that has so many conditional branches that I'm unwilling 
to attempt a definitive one. You definitely need to install db48 on 
the recipient machine if you want it to be able to read hash files 
created elsewhere by db48. Depending on what other software is using 
db42 dn db44 there, installing db48 and doing nothing else MIGHT break 
something. Depending on how Perl was built and/or installed on that 
machine and how the various db* packages are installed it MIGHT be 
necessary to rebuild your core Perl package and/or non-core packages 
which may include BerkeleyDB or (probably not) DB_File and maybe (but 
most likely not) SpamAssassin itself. Figuring out what exactly 
depends on which package on a specific system (which you've not 
described in any detail) is an opportunity to exercise your core 
system administration skills :).


Thank you :-)



Re: Error when trying to re-use Bayes database from one server to another

2016-02-13 Thread Sebastian Arcus

On 13/02/16 04:32, Bill Cole wrote:

On 12 Feb 2016, at 17:34, Sebastian Arcus wrote:

Thanks for that suggestion. I think we might be getting somewhere. On 
original machine:


#file bayes_seen
bayes_seen: Berkeley DB (Hash, version 9, native byte-order)

# file bayes_toks
bayes_toks: Berkeley DB (Hash, version 9, native byte-order)


On the receiver machine, but with bayes files created locally:

#file bayes_seen
bayes_seen: Berkeley DB (Hash, version 8, native byte-order)

# file bayes_toks
bayes_toks: Berkeley DB (Hash, version 8, native byte-order)


Could the hash version account for the errors I am seeing?


Absolutely. The BDB hash storage version number only changes when a 
change is NOT backwards-compatible, i.e. *BY DESIGN* a library version 
that creates v8 files cannot read v9 files. If my recollection is 
correct, the v8->9 change was in BDB 4.6 and actually provided 
substantial performance improvements. You probably want to upgrade BDB 
and anything using it on the machine with the old version.


Thank you. The donor machine has db42, db44 and db44 packages installed, 
while the recipient machine only db42 and db44. Would it be enough to 
install db48 on the recipient machine, or are there also any 
glue/library Perl modules involved which SA uses for db access and would 
need to be updated as well?


Error when trying to re-use Bayes database from one server to another

2016-02-12 Thread Sebastian Arcus
As per advice from this list, I have been re-using my bayes databases on 
several different servers running SA. On one of the servers though, the 
database is not accepted. I re-transferred them several times over ssh, 
to make sure they were not corrupted. The database files are in the 
correct location, with correct permissions and owned by the correct user:


# ls -l /var/spool/spamd/bayes/
total 5912
-rw-rw-rw- 1 spamd spamd 1310720 2016-02-09 08:42 bayes_seen
-rw-rw-rw- 1 spamd spamd 4739072 2016-02-09 08:43 bayes_toks

The version of SA on both the donor and receiving servers is 3.4.1.

When I try to learn a new message on the receiving server (where I moved 
the bayes files), I get the following error:


# su - spamd -c "/usr/bin/sa-learn -D --spam /New\ UnansweredSexHookup\ 
Request.eml"




Feb 12 16:20:53.777 [12973] dbg: locker: mode is 438
Feb 12 16:20:53.778 [12973] dbg: locker: safe_lock: created 
/var/spool/spamd/bayes/bayes.lock.mdr-server.mdrinteriors.co.uk.12973
Feb 12 16:20:53.778 [12973] dbg: locker: safe_lock: trying to get lock 
on /var/spool/spamd/bayes/bayes with 0 retries
Feb 12 16:20:53.778 [12973] dbg: locker: safe_lock: link to 
/var/spool/spamd/bayes/bayes.lock: link ok
Feb 12 16:20:53.778 [12973] dbg: bayes: tie-ing to DB file R/W 
/var/spool/spamd/bayes/bayes_toks

Feb 12 16:20:53.779 [12973] dbg: bayes: untie-ing DB file toks
Feb 12 16:20:53.779 [12973] dbg: locker: safe_unlock: unlink 
/var/spool/spamd/bayes/bayes.lock
bayes: cannot open bayes databases /var/spool/spamd/bayes/bayes_* R/W: 
tie failed: No such file or directory

Learned tokens from 0 message(s) (1 message(s) examined)
Feb 12 16:20:53.779 [12973] dbg: plugin: 
Mail::SpamAssassin::Plugin::Bayes=HASH(0x93106d0) implements 
'learner_close', priority 0
ERROR: the Bayes learn function returned an error, please re-run with -D 
for more information at /usr/bin/sa-learn line 498.




Re: Error when trying to re-use Bayes database from one server to another

2016-02-12 Thread Sebastian Arcus

On 12/02/16 16:59, Reindl Harald wrote:



Am 12.02.2016 um 17:29 schrieb Sebastian Arcus:

As per advice from this list, I have been re-using my bayes databases on
several different servers running SA. On one of the servers though, the
database is not accepted. I re-transferred them several times over ssh,
to make sure they were not corrupted. The database files are in the
correct location, with correct permissions and owned by the correct 
user:


# ls -l /var/spool/spamd/bayes/
total 5912
-rw-rw-rw- 1 spamd spamd 1310720 2016-02-09 08:42 bayes_seen
-rw-rw-rw- 1 spamd spamd 4739072 2016-02-09 08:43 bayes_toks

The version of SA on both the donor and receiving servers is 3.4.1.

When I try to learn a new message on the receiving server (where I moved
the bayes files), I get the following error:


su - spamd
stat /var
stat /var/spool
stat /var/spool/spamd
stat /var/spool/spamd/bayes

Linux is not like Windows - if ou don't have access to a parent folder 
you just don't have access





root@mdr-server:/# su - spamd
No directory, logging in with HOME=/

spamd@mdr-server:/$ stat /var
  File: `/var'
  Size: 4096  Blocks: 8  IO Block: 4096   directory
Device: 900h/2304dInode: 12  Links: 16
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/ root)
Access: 2016-01-18 09:28:23.0 +
Modify: 2016-01-18 09:22:47.0 +
Change: 2016-01-18 09:28:23.744774236 +

spamd@mdr-server:/$ stat /var/spool
  File: `/var/spool'
  Size: 4096  Blocks: 8  IO Block: 4096   directory
Device: 900h/2304dInode: 118 Links: 22
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/ root)
Access: 2015-02-03 14:28:33.0 +
Modify: 2015-12-03 17:41:28.859794403 +
Change: 2015-12-03 17:41:28.859794403 +

spamd@mdr-server:/$ stat /var/spool/spamd
  File: `/var/spool/spamd'
  Size: 4096  Blocks: 8  IO Block: 4096   directory
Device: 900h/2304dInode: 15473107Links: 3
Access: (0770/drwxrwx---)  Uid: ( 1037/   spamd)   Gid: (  252/ spamd)
Access: 2015-12-03 17:41:28.859794403 +
Modify: 2015-12-03 17:41:32.011239989 +
Change: 2015-12-03 17:46:59.187806044 +

spamd@mdr-server:/$ stat /var/spool/spamd/bayes/
  File: `/var/spool/spamd/bayes/'
  Size: 4096  Blocks: 8  IO Block: 4096   directory
Device: 900h/2304dInode: 15473106Links: 3
Access: (0776/drwxrwxrw-)  Uid: ( 1037/   spamd)   Gid: (  252/ spamd)
Access: 2015-12-03 17:41:32.011239989 +
Modify: 2016-02-12 16:20:53.778709980 +
Change: 2016-02-12 16:20:53.778709980 +



Re: Error when trying to re-use Bayes database from one server to another

2016-02-12 Thread Sebastian Arcus

On 12/02/16 16:59, Reindl Harald wrote:



Am 12.02.2016 um 17:29 schrieb Sebastian Arcus:

As per advice from this list, I have been re-using my bayes databases on
several different servers running SA. On one of the servers though, the
database is not accepted. I re-transferred them several times over ssh,
to make sure they were not corrupted. The database files are in the
correct location, with correct permissions and owned by the correct 
user:


# ls -l /var/spool/spamd/bayes/
total 5912
-rw-rw-rw- 1 spamd spamd 1310720 2016-02-09 08:42 bayes_seen
-rw-rw-rw- 1 spamd spamd 4739072 2016-02-09 08:43 bayes_toks

The version of SA on both the donor and receiving servers is 3.4.1.

When I try to learn a new message on the receiving server (where I moved
the bayes files), I get the following error:


su - spamd
stat /var
stat /var/spool
stat /var/spool/spamd
stat /var/spool/spamd/bayes

Linux is not like Windows - if ou don't have access to a parent folder 
you just don't have access



Sorry - previous reply sent in HTML format by mistake:

root@mdr-server:/# su - spamd
No directory, logging in with HOME=/

spamd@mdr-server:/$ stat /var
  File: `/var'
  Size: 4096  Blocks: 8  IO Block: 4096   directory
Device: 900h/2304dInode: 12  Links: 16
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/ root)
Access: 2016-01-18 09:28:23.0 +
Modify: 2016-01-18 09:22:47.0 +
Change: 2016-01-18 09:28:23.744774236 +

spamd@mdr-server:/$ stat /var/spool
  File: `/var/spool'
  Size: 4096  Blocks: 8  IO Block: 4096   directory
Device: 900h/2304dInode: 118 Links: 22
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/ root)
Access: 2015-02-03 14:28:33.0 +
Modify: 2015-12-03 17:41:28.859794403 +
Change: 2015-12-03 17:41:28.859794403 +

spamd@mdr-server:/$ stat /var/spool/spamd
  File: `/var/spool/spamd'
  Size: 4096  Blocks: 8  IO Block: 4096   directory
Device: 900h/2304dInode: 15473107Links: 3
Access: (0770/drwxrwx---)  Uid: ( 1037/   spamd)   Gid: (  252/ spamd)
Access: 2015-12-03 17:41:28.859794403 +
Modify: 2015-12-03 17:41:32.011239989 +
Change: 2015-12-03 17:46:59.187806044 +

spamd@mdr-server:/$ stat /var/spool/spamd/bayes/
  File: `/var/spool/spamd/bayes/'
  Size: 4096  Blocks: 8  IO Block: 4096   directory
Device: 900h/2304dInode: 15473106Links: 3
Access: (0776/drwxrwxrw-)  Uid: ( 1037/   spamd)   Gid: (  252/ spamd)
Access: 2015-12-03 17:41:32.011239989 +
Modify: 2016-02-12 16:20:53.778709980 +
Change: 2016-02-12 16:20:53.778709980 +




Re: Error when trying to re-use Bayes database from one server to another

2016-02-12 Thread Sebastian Arcus

On 12/02/16 19:14, Reindl Harald wrote:



Am 12.02.2016 um 20:06 schrieb Marc Perkel:

Any chance that the parent directory structure doesn't have enough
permissions?

The error message says it can't access it so there's your clue. Since
the files themselves seem to have good permissions I would look at the
directories.


see previous mail - that was already verified
looking closer "No such file or directory" is not a permission problem

there was a hint "please re-run with -D"

at least re-use bayes on different servers, even over different 
operating systems is no problem, or bayes is running on 3 own and 2 
foreign machines for a long time now with great results


I've checked and triple checked everything. Unless I'm missing something 
blindingly obvious, I don't think that error message is accurate. If I 
delete the bayes files and restart spamd, on running sa-learn, new ones 
are created in exactly the same place, with same name and same 
permissions - and they work fine. But the ones brought over from the 
other server don't work.


PS - Regarding the "re-run with -D for more information" - I guess that 
message is slightly pointless, as it keeps on saying that even when you 
run it with "-D"


  1   2   >