Re: SA-Update not updating DB

2017-11-16 Thread Rafael Leiva-Ochoa
Updated mine, it seems to be working so far.

On Wed, Nov 15, 2017 at 5:39 AM, Kevin A. McGrail  wrote:

> On 11/15/2017 8:27 AM, David Jones wrote:
>
>>
>> We are getting closer to having complete rulesets working again.
>>
>> NOTE: The latest ruleset generated hours ago is still not quite complete
>> but I think it's safe for testing.  I have applied this ruleset to my
>> platforms.
>>
>> REV=1815188
>> wget http://sa-update.ena.com/${REV}.tar.gz
>> wget http://sa-update.ena.com/${REV}.tar.gz.sha1
>> wget http://sa-update.ena.com/${REV}.tar.gz.asc
>> sa-update -v --install ${REV}.tar.gz
>>
> FYI to others, I am testing in production and even the not quite perfect
> ones haven't had a large impact on production.
>
> Best,
> KAM
>


Re: SA-Update not updating DB

2017-11-16 Thread Chris
On Thu, 2017-11-16 at 09:06 -0600, David Jones wrote:
> On 11/16/2017 08:57 AM, Chris wrote:
> > 
> > On Thu, 2017-11-16 at 07:22 -0600, David Jones wrote:
> > > 
> > > Great news!  Last night's run finally produced a full
> > > 72_scores.cf.
> > > Big
> > > thanks to Merijn van den Kroonenberg for helping track down the
> > > remaining issues!  There were about 3 rules difference which
> > > could
> > > be
> > > expected with 8 months difference.
> > > 
> > > # cat disappeared_rules.txt
> > > ADVANCE_FEE_4_NEW
> > > CN_B2B_SPAMMER
> > > URI_GOOGLE_PROXY
> > > 
> > > # wc -l 72_scores.cf
> > > 149 72_scores.cf
> > > 
> > > Now 149 lines and we were stuck at around 100 before.
> > > 
> > > https://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/scores/
> > > 
> > > WE REALLY NEED TESTERS NOW TO APPLY THIS UPDATE AND PROVIDE
> > > FEEDBACK.  I
> > > would like to enable DNS updates again for sa-update on Sunday or
> > > Monday
> > > depending on the feedback.
> > > 
> > > REV=1815298
> > > wget http://sa-update.ena.com/${REV}.tar.gz
> > > wget http://sa-update.ena.com/${REV}.tar.gz.sha1
> > > wget http://sa-update.ena.com/${REV}.tar.gz.asc
> > > sa-update -v --install ${REV}.tar.gz
> > > 
> > > (reload/restart whatever is calling SA -- spamd, amavis-new,
> > > mimedefang,
> > > MailScanner, etc.)
> > > 
> > > I have applied this ruleset to my platforms and will monitor
> > > scoring/blocking over the next couple of days.
> > > 
> > Hmm, the file doesn't seem to be able to be found unless of course
> > I
> > did something incorrectly:
> > 
> > chris@localhost:~/Downloads$ wget http://sa-update.ena.com/${REV}.t
> > ar.g
> > z
> > --2017-11-16 08:51:50--  http://sa-update.ena.com/.tar.gz
> > Resolving sa-update.ena.com (sa-update.ena.com)... 96.4.1.5,
> > 96.5.1.5
> > Connecting to sa-update.ena.com (sa-update.ena.com)|96.4.1.5|:80...
> > connected.
> > HTTP request sent, awaiting response... 404 Not Found
> > 2017-11-16 08:51:50 ERROR 404: Not Found.
> > 
> Make sure you ran the "REV=1815298" line first to set the variable
> that 
> the next 4 lines use with "${REV}".
> 
I got it David and Dave. Thanks for straightening me out.
-- 
Chris
KeyID 0xE372A7DA98E6705C
31.11972; -97.90167 (Elev. 1092 ft)
16:20:55 up 9 days, 7:52, 1 user, load average: 1.10, 0.72, 0.74
Description:Ubuntu 16.04.3 LTS, kernel 4.10.0-38-generic


signature.asc
Description: This is a digitally signed message part


Re: The rise of highly targeted spam emails

2017-11-16 Thread Martin Gregorie
> Thank you for the info. I haven't considered it before, but it makes 
> sense to store large mail archives in SQL databases. I suppose it is
> one of the few ways to efficiently search such a large volume of data
> - much faster than searching Maildir or MBOX archives.
> 
... and it lets you specify more than one search term too. Most decent
MUAs (I use Evolution) let you specify (optionally) the folder and a
match phrase for sender name(s) OR AND the subject. My query tool lets
me specify 

  (partial sender name OR address) 
  AND partial subject 
  AND phrase in plaintext parts of the message body 
  AND a date range. 

Any of the four AND parts can be ignored if they would not help with
the search. The search tool uses a form-filling screen which allows any
or all of the ANDed search terms to be input, so its quite easy to use.
It can also let the user select e-mail addresses and subjects from
scrollable lists, though it does take a second or two to present these
lists.
 
In practice I usually specify the date range and one of sender name,
partial subject or partial plaintext phrase. I expected searching
message plaintext body parts for matching phrases would be rather slow
but it has turned out to be quite a bit faster than I'd expected.

Bottom line: using a database and a dedicated search tool gives users a
lot of flexibility in searching for and retrieving archived messages
and its a lot faster than scanning through their MUA's folders to find
old messages or discussions.
 
> I guess one aspect that is less than ideal is the fact that it
> wouldn't be possible to give archive access to users through their
> normal mail software interface - such as Thunderbird for example.
>
Something equivalent to my search tool should be easy enough for most 
users to install and use.

My search tool only shows its user a list of matching messages.
Selecting any of these causes the selected message(s) to be sent to the
 user as an attachment to a cover message. This serves two purposes: 
(1) it allows the user to see the unmodified headers of the retrieved
message
(2) it means that I didn't have to faff round making the search tool
play nicely with displaying the message and decoding sattachments,
etc AND it means the user will see the retrieved message pretty
much exactly as it would have been if they'd received it directly.

My search tool is a Java application, so needs the Java JRE to be
installed on a user's PC, but it could equally easily have been a web-
based search tool written on PHP, Javascript or whatever. A;ll that
would need in addition would be webserver installed somewhere where it
can access the database. I run a local copy of the Apache webserver
with access to both it and my PostgreSQL database restricted to
computers on my LAN.

 
Martin




Re: The rise of highly targeted spam emails

2017-11-16 Thread Sebastian Arcus


On 16/11/17 12:16, Martin Gregorie wrote:

On Thu, 2017-11-16 at 09:15 +, Sebastian Arcus wrote:

On 15/11/17 18:11, Martin Gregorie wrote:

On Wed, 2017-11-15 at 14:44 +, Sebastian Arcus wrote:





I initially decided that an archive was A Good Thing to have,
simply because retrieving mail from it should be a lot faster than
searching through huge mail folders. This turned out to be true in
practice: the archive currently holds 183,000 emails and a worst
case search takes around 30 seconds to return a list of hits
(running on a 3 GHz dual Athlon system with 4GB RAM and Fedora 25
as its OS).


Thank you for the details. How do you search the archive? With grep
directly on the server?


Using SQL queries.

The two main tables in the database hold e-mail addresses and messages
respectively plus there are many-to-many links between the two that are
implemented with a third table that holds the link type ('To' or
'From') and an additional table containing subject text - this has a
one-to-many relationship with the messages.

The SA plugin just looks at the From header in the message being
checked and, if it finds that address in the database, sees if there
are any 'To' links associated with it. If there are, then the message
gets negative points. As I said, this SQL query is actually run against
a database view that combines the address and link tables. Since the
rows on these tables are small and the tables are indexed on address
and link type, the query is very fast.

If you want to know more about the archive, look here:
http://www.libelle-systems.c3487738.myzen.co.uk/mailarchive/

Ignore the licensing stuff: I initially thought I might be onto a
revenue source, but remarkably few people use mail archives. I should
remove the license management code and open source the archive but so
far haven't got round to doing that.


Thank you for the info. I haven't considered it before, but it makes 
sense to store large mail archives in SQL databases. I suppose it is one 
of the few ways to efficiently search such a large volume of data - much 
faster than searching Maildir or MBOX archives.


I guess one aspect that is less than ideal is the fact that it wouldn't 
be possible to give archive access to users through their normal mail 
software interface - such as Thunderbird for example.


Re: all recipients with the same first character

2017-11-16 Thread RW
On Thu, 16 Nov 2017 11:56:03 +
MAYER Hans wrote:

> Dear All,
> 
> Analyzing some e-mails which are not caught by SA I see sometime the
> following scenario: Such an e-mail is sent to a lot of people ( not
> only to the own domain ) and all e-mail addresses start with the same
> first character. If I see this I know immediately this is spam.
> 
> Is there anywhere a rule which can detect such a behavior ?

There are a couple of rules that do something similar: 


SORTED_RECIPS looks for 7 or more addresses in alphabetical order -
the kind of spam you are seeing is likely created from a sorted list.


SUSPICIOUS_RECIPS is a bit complicated, but it's sort of a two letter
version of what you want.


Re: SA-Update not updating DB

2017-11-16 Thread Matthew Broadhead

Hi,
I downloaded and applied the update on CentOS 7 
3.10.0-693.2.2.el7.x86_64.  spamassassin-3.4.0-2.el7.x86_64. everything 
seems to be working fine so far.  I will let you know if there are any 
issues

Thanks for all the work on the update,
Matthew

On 16/11/2017 14:22, David Jones wrote:
Great news!  Last night's run finally produced a full 72_scores.cf. 
Big thanks to Merijn van den Kroonenberg for helping track down the 
remaining issues!  There were about 3 rules difference which could be 
expected with 8 months difference.


# cat disappeared_rules.txt
ADVANCE_FEE_4_NEW
CN_B2B_SPAMMER
URI_GOOGLE_PROXY

# wc -l 72_scores.cf
149 72_scores.cf

Now 149 lines and we were stuck at around 100 before.

https://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/scores/

WE REALLY NEED TESTERS NOW TO APPLY THIS UPDATE AND PROVIDE FEEDBACK.  
I would like to enable DNS updates again for sa-update on Sunday or 
Monday depending on the feedback.


REV=1815298
wget http://sa-update.ena.com/${REV}.tar.gz
wget http://sa-update.ena.com/${REV}.tar.gz.sha1
wget http://sa-update.ena.com/${REV}.tar.gz.asc
sa-update -v --install ${REV}.tar.gz

(reload/restart whatever is calling SA -- spamd, amavis-new, 
mimedefang, MailScanner, etc.)


I have applied this ruleset to my platforms and will monitor 
scoring/blocking over the next couple of days.


On Tue, Nov 14, 2017 at 2:36 PM, John Hardin > wrote:


    On Tue, 14 Nov 2017, Rafael Leiva-Ochoa wrote:

  I am running SpamAssassin 3.4.1, and I have been trying to
    update the DB
    located on /var/lib/spamassassin/3.004001/ using SA-UPDATE. 
But,

    it has not
    gotten an update in almost 2 weeks.


    The rules update service has been down due to infrastructure
    problems for a few months now. It is very close (like, this 
week) to

    being fixed.

    --   John Hardin KA7OHZ http://www.impsec.org/~jhardin/
    
    jhar...@impsec.org  FALaholic #11174 
    pgpk -a jhar...@impsec.org 
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 
B873 2E79

---
   We should endeavour to teach our children to be gun-proof
   rather than trying to design our guns to be child-proof
---
  229 days since the first commercial re-flight of an orbital
    booster (SpaceX)











Re: SA-Update not updating DB

2017-11-16 Thread Dave Wreski



REV=1815298
wget http://sa-update.ena.com/${REV}.tar.gz
wget http://sa-update.ena.com/${REV}.tar.gz.sha1
wget http://sa-update.ena.com/${REV}.tar.gz.asc
sa-update -v --install ${REV}.tar.gz

(reload/restart whatever is calling SA -- spamd, amavis-new,
mimedefang,
MailScanner, etc.)

I have applied this ruleset to my platforms and will monitor
scoring/blocking over the next couple of days.


Hmm, the file doesn't seem to be able to be found unless of course I
did something incorrectly:

chris@localhost:~/Downloads$ wget http://sa-update.ena.com/${REV}.tar.g
z
--2017-11-16 08:51:50--  http://sa-update.ena.com/.tar.gz
Resolving sa-update.ena.com (sa-update.ena.com)... 96.4.1.5, 96.5.1.5
Connecting to sa-update.ena.com (sa-update.ena.com)|96.4.1.5|:80...
connected.
HTTP request sent, awaiting response... 404 Not Found
2017-11-16 08:51:50 ERROR 404: Not Found.


You forgot to set the REV environment variable first. Just copy the text 
including the REV=1815298 and paste on the command-line as root and it 
should work.


Regards,
Dave






Re: SA-Update not updating DB

2017-11-16 Thread David Jones

On 11/16/2017 08:57 AM, Chris wrote:

On Thu, 2017-11-16 at 07:22 -0600, David Jones wrote:

Great news!  Last night's run finally produced a full 72_scores.cf.
Big
thanks to Merijn van den Kroonenberg for helping track down the
remaining issues!  There were about 3 rules difference which could
be
expected with 8 months difference.

# cat disappeared_rules.txt
ADVANCE_FEE_4_NEW
CN_B2B_SPAMMER
URI_GOOGLE_PROXY

# wc -l 72_scores.cf
149 72_scores.cf

Now 149 lines and we were stuck at around 100 before.

https://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/scores/

WE REALLY NEED TESTERS NOW TO APPLY THIS UPDATE AND PROVIDE
FEEDBACK.  I
would like to enable DNS updates again for sa-update on Sunday or
Monday
depending on the feedback.

REV=1815298
wget http://sa-update.ena.com/${REV}.tar.gz
wget http://sa-update.ena.com/${REV}.tar.gz.sha1
wget http://sa-update.ena.com/${REV}.tar.gz.asc
sa-update -v --install ${REV}.tar.gz

(reload/restart whatever is calling SA -- spamd, amavis-new,
mimedefang,
MailScanner, etc.)

I have applied this ruleset to my platforms and will monitor
scoring/blocking over the next couple of days.


Hmm, the file doesn't seem to be able to be found unless of course I
did something incorrectly:

chris@localhost:~/Downloads$ wget http://sa-update.ena.com/${REV}.tar.g
z
--2017-11-16 08:51:50--  http://sa-update.ena.com/.tar.gz
Resolving sa-update.ena.com (sa-update.ena.com)... 96.4.1.5, 96.5.1.5
Connecting to sa-update.ena.com (sa-update.ena.com)|96.4.1.5|:80...
connected.
HTTP request sent, awaiting response... 404 Not Found
2017-11-16 08:51:50 ERROR 404: Not Found.



Make sure you ran the "REV=1815298" line first to set the variable that 
the next 4 lines use with "${REV}".


--
David Jones


Re: SA-Update not updating DB

2017-11-16 Thread Chris
On Thu, 2017-11-16 at 07:22 -0600, David Jones wrote:
> Great news!  Last night's run finally produced a full 72_scores.cf.
> Big 
> thanks to Merijn van den Kroonenberg for helping track down the 
> remaining issues!  There were about 3 rules difference which could
> be 
> expected with 8 months difference.
> 
> # cat disappeared_rules.txt
> ADVANCE_FEE_4_NEW
> CN_B2B_SPAMMER
> URI_GOOGLE_PROXY
> 
> # wc -l 72_scores.cf
> 149 72_scores.cf
> 
> Now 149 lines and we were stuck at around 100 before.
> 
> https://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/scores/
> 
> WE REALLY NEED TESTERS NOW TO APPLY THIS UPDATE AND PROVIDE
> FEEDBACK.  I 
> would like to enable DNS updates again for sa-update on Sunday or
> Monday 
> depending on the feedback.
> 
> REV=1815298
> wget http://sa-update.ena.com/${REV}.tar.gz
> wget http://sa-update.ena.com/${REV}.tar.gz.sha1
> wget http://sa-update.ena.com/${REV}.tar.gz.asc
> sa-update -v --install ${REV}.tar.gz
> 
> (reload/restart whatever is calling SA -- spamd, amavis-new,
> mimedefang, 
> MailScanner, etc.)
> 
> I have applied this ruleset to my platforms and will monitor 
> scoring/blocking over the next couple of days.
> 
Hmm, the file doesn't seem to be able to be found unless of course I
did something incorrectly:

chris@localhost:~/Downloads$ wget http://sa-update.ena.com/${REV}.tar.g
z
--2017-11-16 08:51:50--  http://sa-update.ena.com/.tar.gz
Resolving sa-update.ena.com (sa-update.ena.com)... 96.4.1.5, 96.5.1.5
Connecting to sa-update.ena.com (sa-update.ena.com)|96.4.1.5|:80...
connected.
HTTP request sent, awaiting response... 404 Not Found
2017-11-16 08:51:50 ERROR 404: Not Found.

-- 
Chris
KeyID 0xE372A7DA98E6705C
31.11972; -97.90167 (Elev. 1092 ft)
08:55:52 up 9 days, 27 min, 1 user, load average: 7.24, 2.93, 1.31
Description:Ubuntu 16.04.3 LTS, kernel 4.10.0-38-generic


signature.asc
Description: This is a digitally signed message part


Re: potential new SA feature: Direct DNS Querying Per DNSBL Zone

2017-11-16 Thread RW
On Wed, 15 Nov 2017 12:03:58 -0500
Rob McEwen wrote:


> Why is this "Direct DNS Querying Per DNSBL Zone" feature
> needed/important?

In most of these cases you'd be better-off simply setting "dns_server"
in the SA configuration. This eliminates the effect of changes to
resolv.conf, and the setting takes a port value, so it needn't even
point to localhost:53.

The change does provide a benefit where an admin can't even start a
daemon on a non-standard port, but I think its general usefulness has
been greatly inflated.

What is interesting about this is if it were implemented in full, with
DNS caching, it wouldn't be much more difficult to have SA do an NS
look-up to find authoritative servers for each list. That would allow
network tests to work correctly by default.











Re: SA-Update not updating DB

2017-11-16 Thread David Jones
Great news!  Last night's run finally produced a full 72_scores.cf. Big 
thanks to Merijn van den Kroonenberg for helping track down the 
remaining issues!  There were about 3 rules difference which could be 
expected with 8 months difference.


# cat disappeared_rules.txt
ADVANCE_FEE_4_NEW
CN_B2B_SPAMMER
URI_GOOGLE_PROXY

# wc -l 72_scores.cf
149 72_scores.cf

Now 149 lines and we were stuck at around 100 before.

https://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/scores/

WE REALLY NEED TESTERS NOW TO APPLY THIS UPDATE AND PROVIDE FEEDBACK.  I 
would like to enable DNS updates again for sa-update on Sunday or Monday 
depending on the feedback.


REV=1815298
wget http://sa-update.ena.com/${REV}.tar.gz
wget http://sa-update.ena.com/${REV}.tar.gz.sha1
wget http://sa-update.ena.com/${REV}.tar.gz.asc
sa-update -v --install ${REV}.tar.gz

(reload/restart whatever is calling SA -- spamd, amavis-new, mimedefang, 
MailScanner, etc.)


I have applied this ruleset to my platforms and will monitor 
scoring/blocking over the next couple of days.


On Tue, Nov 14, 2017 at 2:36 PM, John Hardin > wrote:


    On Tue, 14 Nov 2017, Rafael Leiva-Ochoa wrote:

  I am running SpamAssassin 3.4.1, and I have been trying to
    update the DB
    located on /var/lib/spamassassin/3.004001/ using SA-UPDATE. But,
    it has not
    gotten an update in almost 2 weeks.


    The rules update service has been down due to infrastructure
    problems for a few months now. It is very close (like, this week) to
    being fixed.

    --   John Hardin KA7OHZ http://www.impsec.org/~jhardin/
    
    jhar...@impsec.org  FALaholic #11174 
    pgpk -a jhar...@impsec.org 
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 
B873 2E79

---
   We should endeavour to teach our children to be gun-proof
   rather than trying to design our guns to be child-proof
---
  229 days since the first commercial re-flight of an orbital
    booster (SpaceX)







--
David Jones



Re: all recipients with the same first character

2017-11-16 Thread hamann . w

>> 
>> Dear All,
>> 
>> Analyzing some e-mails which are not caught by SA I see sometime the 
>> following scenario:
>> Such an e-mail is sent to a lot of people ( not only to the own domain ) and 
>> all e-mail addresses start with the same first character.
>> If I see this I know immediately this is spam.
>> 
>> Is there anywhere a rule which can detect such a behavior ?
>> 
>> Kind regards
>> Hans
>> 
>> 
>> 
Hi Hans,

I am quite happy with a small whitelist of senders that I let through if I am 
not the only recipient.
It took me a while to whitelist all mailing lists, though

Regards
Wolfgang




Re: The rise of highly targeted spam emails

2017-11-16 Thread Martin Gregorie
On Thu, 2017-11-16 at 09:15 +, Sebastian Arcus wrote:
> On 15/11/17 18:11, Martin Gregorie wrote:
> > On Wed, 2017-11-15 at 14:44 +, Sebastian Arcus wrote:
> 
> 
> > 
> > I initially decided that an archive was A Good Thing to have,
> > simply because retrieving mail from it should be a lot faster than
> > searching through huge mail folders. This turned out to be true in
> > practice: the archive currently holds 183,000 emails and a worst
> > case search takes around 30 seconds to return a list of hits
> > (running on a 3 GHz dual Athlon system with 4GB RAM and Fedora 25
> > as its OS).
> 
> Thank you for the details. How do you search the archive? With grep 
> directly on the server?
>
Using SQL queries.

The two main tables in the database hold e-mail addresses and messages
respectively plus there are many-to-many links between the two that are
implemented with a third table that holds the link type ('To' or
'From') and an additional table containing subject text - this has a
one-to-many relationship with the messages.

The SA plugin just looks at the From header in the message being
checked and, if it finds that address in the database, sees if there
are any 'To' links associated with it. If there are, then the message
gets negative points. As I said, this SQL query is actually run against
a database view that combines the address and link tables. Since the
rows on these tables are small and the tables are indexed on address
and link type, the query is very fast.

If you want to know more about the archive, look here:
http://www.libelle-systems.c3487738.myzen.co.uk/mailarchive/

Ignore the licensing stuff: I initially thought I might be onto a
revenue source, but remarkably few people use mail archives. I should
remove the license management code and open source the archive but so
far haven't got round to doing that.

Martin

 




all recipients with the same first character

2017-11-16 Thread MAYER Hans

Dear All,

Analyzing some e-mails which are not caught by SA I see sometime the following 
scenario:
Such an e-mail is sent to a lot of people ( not only to the own domain ) and 
all e-mail addresses start with the same first character.
If I see this I know immediately this is spam.

Is there anywhere a rule which can detect such a behavior ?

Kind regards
Hans




Re: The rise of highly targeted spam emails

2017-11-16 Thread Sebastian Arcus


On 15/11/17 18:11, Martin Gregorie wrote:

On Wed, 2017-11-15 at 14:44 +, Sebastian Arcus wrote:




I initially decided that an archive was A Good Thing to have, simply
because retrieving mail from it should be a lot faster than searching
through huge mail folders. This turned out to be true in practice: the
archive currently holds 183,000 emails and a worst case search takes
around 30 seconds to return a list of hits (running on a 3 GHz dual
Athlon system with 4GB RAM and Fedora 25 as its OS).


Thank you for the details. How do you search the archive? With grep 
directly on the server?


Re: The rise of highly targeted spam emails

2017-11-16 Thread Sebastian Arcus

On 15/11/17 15:16, Reindl Harald wrote:



Am 15.11.2017 um 15:47 schrieb Sebastian Arcus:

On 15/11/17 09:56, Reindl Harald wrote:


Am 15.11.2017 um 09:41 schrieb Sebastian Arcus:
I can't really train the bayesian filter on these emails, as it 
would start to affect ham emails classification


this is a unproven claim!

we have here phishings in bayes which are classified with BAYES_99 
where my human eyes hardly can distinct them between origin messages 
classified with BAYES_00 - you just need to train both and bayes will 
find the differences over time


I'm not sure I understand this? In my limited knowledge of how 
bayesian filters work, I assumed that if the words are the 
same/similar between emails, they should produce similar bayes scores, 
no? Do you have any links to explanations of how this would work - as 
I am keen not to affect the wrong way the bayes databases I built over 
time


bayes also takes headers into account as well as a lot of invisible 
stuff, fact is that we block all the DHL phishings which existed the 
last years and short ago i saw some appearently new with a foreign 
envelope/from address failing SPF where a dhl.com server sent on behalf 
of the customer and that thing was even without whitelist_auth correctly 
classified with BAYES_00


and yes, i have QA scriptts iterating over all the spam and ham samples 
collected since 2014, test the current bayes classification, alerts if 
spam does not get BAYES_99 or ham not BAYES_00 and in that case 
"sa-retrain.sh smaple-path" which makes 5 copies with some modified 
headers like message-id and retrains them


Interesting - thank you for the details. Is this your person mailbox(es) 
- or a larger setup?