Re: Error when trying to re-use Bayes database from one server to another

2016-02-12 Thread Sebastian Arcus

On 12/02/16 19:14, Reindl Harald wrote:



Am 12.02.2016 um 20:06 schrieb Marc Perkel:

Any chance that the parent directory structure doesn't have enough
permissions?

The error message says it can't access it so there's your clue. Since
the files themselves seem to have good permissions I would look at the
directories.


see previous mail - that was already verified
looking closer "No such file or directory" is not a permission problem

there was a hint "please re-run with -D"

at least re-use bayes on different servers, even over different 
operating systems is no problem, or bayes is running on 3 own and 2 
foreign machines for a long time now with great results


I've checked and triple checked everything. Unless I'm missing something 
blindingly obvious, I don't think that error message is accurate. If I 
delete the bayes files and restart spamd, on running sa-learn, new ones 
are created in exactly the same place, with same name and same 
permissions - and they work fine. But the ones brought over from the 
other server don't work.


PS - Regarding the "re-run with -D for more information" - I guess that 
message is slightly pointless, as it keeps on saying that even when you 
run it with "-D"


Re: Error when trying to re-use Bayes database from one server to another

2016-02-12 Thread Sebastian Arcus

On 12/02/16 16:59, Reindl Harald wrote:



Am 12.02.2016 um 17:29 schrieb Sebastian Arcus:

As per advice from this list, I have been re-using my bayes databases on
several different servers running SA. On one of the servers though, the
database is not accepted. I re-transferred them several times over ssh,
to make sure they were not corrupted. The database files are in the
correct location, with correct permissions and owned by the correct 
user:


# ls -l /var/spool/spamd/bayes/
total 5912
-rw-rw-rw- 1 spamd spamd 1310720 2016-02-09 08:42 bayes_seen
-rw-rw-rw- 1 spamd spamd 4739072 2016-02-09 08:43 bayes_toks

The version of SA on both the donor and receiving servers is 3.4.1.

When I try to learn a new message on the receiving server (where I moved
the bayes files), I get the following error:


su - spamd
stat /var
stat /var/spool
stat /var/spool/spamd
stat /var/spool/spamd/bayes

Linux is not like Windows - if ou don't have access to a parent folder 
you just don't have access



Sorry - previous reply sent in HTML format by mistake:

root@mdr-server:/# su - spamd
No directory, logging in with HOME=/

spamd@mdr-server:/$ stat /var
  File: `/var'
  Size: 4096  Blocks: 8  IO Block: 4096   directory
Device: 900h/2304dInode: 12  Links: 16
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/ root)
Access: 2016-01-18 09:28:23.0 +
Modify: 2016-01-18 09:22:47.0 +
Change: 2016-01-18 09:28:23.744774236 +

spamd@mdr-server:/$ stat /var/spool
  File: `/var/spool'
  Size: 4096  Blocks: 8  IO Block: 4096   directory
Device: 900h/2304dInode: 118 Links: 22
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/ root)
Access: 2015-02-03 14:28:33.0 +
Modify: 2015-12-03 17:41:28.859794403 +
Change: 2015-12-03 17:41:28.859794403 +

spamd@mdr-server:/$ stat /var/spool/spamd
  File: `/var/spool/spamd'
  Size: 4096  Blocks: 8  IO Block: 4096   directory
Device: 900h/2304dInode: 15473107Links: 3
Access: (0770/drwxrwx---)  Uid: ( 1037/   spamd)   Gid: (  252/ spamd)
Access: 2015-12-03 17:41:28.859794403 +
Modify: 2015-12-03 17:41:32.011239989 +
Change: 2015-12-03 17:46:59.187806044 +

spamd@mdr-server:/$ stat /var/spool/spamd/bayes/
  File: `/var/spool/spamd/bayes/'
  Size: 4096  Blocks: 8  IO Block: 4096   directory
Device: 900h/2304dInode: 15473106Links: 3
Access: (0776/drwxrwxrw-)  Uid: ( 1037/   spamd)   Gid: (  252/ spamd)
Access: 2015-12-03 17:41:32.011239989 +
Modify: 2016-02-12 16:20:53.778709980 +
Change: 2016-02-12 16:20:53.778709980 +




Re: Error when trying to re-use Bayes database from one server to another

2016-02-12 Thread Sebastian Arcus

On 12/02/16 16:59, Reindl Harald wrote:



Am 12.02.2016 um 17:29 schrieb Sebastian Arcus:

As per advice from this list, I have been re-using my bayes databases on
several different servers running SA. On one of the servers though, the
database is not accepted. I re-transferred them several times over ssh,
to make sure they were not corrupted. The database files are in the
correct location, with correct permissions and owned by the correct 
user:


# ls -l /var/spool/spamd/bayes/
total 5912
-rw-rw-rw- 1 spamd spamd 1310720 2016-02-09 08:42 bayes_seen
-rw-rw-rw- 1 spamd spamd 4739072 2016-02-09 08:43 bayes_toks

The version of SA on both the donor and receiving servers is 3.4.1.

When I try to learn a new message on the receiving server (where I moved
the bayes files), I get the following error:


su - spamd
stat /var
stat /var/spool
stat /var/spool/spamd
stat /var/spool/spamd/bayes

Linux is not like Windows - if ou don't have access to a parent folder 
you just don't have access





root@mdr-server:/# su - spamd
No directory, logging in with HOME=/

spamd@mdr-server:/$ stat /var
  File: `/var'
  Size: 4096  Blocks: 8  IO Block: 4096   directory
Device: 900h/2304dInode: 12  Links: 16
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/ root)
Access: 2016-01-18 09:28:23.0 +
Modify: 2016-01-18 09:22:47.0 +
Change: 2016-01-18 09:28:23.744774236 +

spamd@mdr-server:/$ stat /var/spool
  File: `/var/spool'
  Size: 4096  Blocks: 8  IO Block: 4096   directory
Device: 900h/2304dInode: 118 Links: 22
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/ root)
Access: 2015-02-03 14:28:33.0 +
Modify: 2015-12-03 17:41:28.859794403 +
Change: 2015-12-03 17:41:28.859794403 +

spamd@mdr-server:/$ stat /var/spool/spamd
  File: `/var/spool/spamd'
  Size: 4096  Blocks: 8  IO Block: 4096   directory
Device: 900h/2304dInode: 15473107Links: 3
Access: (0770/drwxrwx---)  Uid: ( 1037/   spamd)   Gid: (  252/ spamd)
Access: 2015-12-03 17:41:28.859794403 +
Modify: 2015-12-03 17:41:32.011239989 +
Change: 2015-12-03 17:46:59.187806044 +

spamd@mdr-server:/$ stat /var/spool/spamd/bayes/
  File: `/var/spool/spamd/bayes/'
  Size: 4096  Blocks: 8  IO Block: 4096   directory
Device: 900h/2304dInode: 15473106Links: 3
Access: (0776/drwxrwxrw-)  Uid: ( 1037/   spamd)   Gid: (  252/ spamd)
Access: 2015-12-03 17:41:32.011239989 +
Modify: 2016-02-12 16:20:53.778709980 +
Change: 2016-02-12 16:20:53.778709980 +



Error when trying to re-use Bayes database from one server to another

2016-02-12 Thread Sebastian Arcus
As per advice from this list, I have been re-using my bayes databases on 
several different servers running SA. On one of the servers though, the 
database is not accepted. I re-transferred them several times over ssh, 
to make sure they were not corrupted. The database files are in the 
correct location, with correct permissions and owned by the correct user:


# ls -l /var/spool/spamd/bayes/
total 5912
-rw-rw-rw- 1 spamd spamd 1310720 2016-02-09 08:42 bayes_seen
-rw-rw-rw- 1 spamd spamd 4739072 2016-02-09 08:43 bayes_toks

The version of SA on both the donor and receiving servers is 3.4.1.

When I try to learn a new message on the receiving server (where I moved 
the bayes files), I get the following error:


# su - spamd -c "/usr/bin/sa-learn -D --spam /New\ UnansweredSexHookup\ 
Request.eml"




Feb 12 16:20:53.777 [12973] dbg: locker: mode is 438
Feb 12 16:20:53.778 [12973] dbg: locker: safe_lock: created 
/var/spool/spamd/bayes/bayes.lock.mdr-server.mdrinteriors.co.uk.12973
Feb 12 16:20:53.778 [12973] dbg: locker: safe_lock: trying to get lock 
on /var/spool/spamd/bayes/bayes with 0 retries
Feb 12 16:20:53.778 [12973] dbg: locker: safe_lock: link to 
/var/spool/spamd/bayes/bayes.lock: link ok
Feb 12 16:20:53.778 [12973] dbg: bayes: tie-ing to DB file R/W 
/var/spool/spamd/bayes/bayes_toks

Feb 12 16:20:53.779 [12973] dbg: bayes: untie-ing DB file toks
Feb 12 16:20:53.779 [12973] dbg: locker: safe_unlock: unlink 
/var/spool/spamd/bayes/bayes.lock
bayes: cannot open bayes databases /var/spool/spamd/bayes/bayes_* R/W: 
tie failed: No such file or directory

Learned tokens from 0 message(s) (1 message(s) examined)
Feb 12 16:20:53.779 [12973] dbg: plugin: 
Mail::SpamAssassin::Plugin::Bayes=HASH(0x93106d0) implements 
'learner_close', priority 0
ERROR: the Bayes learn function returned an error, please re-run with -D 
for more information at /usr/bin/sa-learn line 498.




Re: Word macros

2015-12-22 Thread Sebastian Arcus

On 22/12/15 10:07, Reindl Harald wrote:



Am 22.12.2015 um 10:26 schrieb Sebastian Arcus:

In terms of ClamAV, I've had next to zero hit rates for new viruses
arriving over email in the last few months (although it is being updated
regularly) - so I'm starting to wonder if there is any point in using
ClamAV for scanning emails at all


without sanesecurity signatures likely no


Thanks for that - I wasn't aware of their existence. I'll give them a go




Re: Word macros

2015-12-22 Thread Sebastian Arcus

On 22/12/15 08:04, Axb wrote:

On 12/21/2015 11:46 PM, Alex wrote:

Hi all,

For the past few days we've been hit with Word macro viruses/spam that
isn't being tagged by clamav or spamassassin, and I thought someone
might be able to take a look:

http://pastebin.com/cAWcAbm2

This one still isn't tagged by clamav/sanesecurity. I've submitted
this sample, so perhaps it is now, but I thought someone might have
some ideas for a meta or something else in the message that could more
generally tag these?

Anyone else seeing these? I've also already added the IP to the client
blocklist.



You may need to add some commercial AV to your layer...

https://www.virustotal.com/en/file/cbf2c9dd334e786e53958927c05ac1c3f749de21e9e0b1cb551c5b8dd3e34a56/analysis/1450770862/ 



quite  a few to choose from...


I've been seeing some of these Word docs with macros in the last few 
days as well. The worrying thing is that some of the (reputable) 
commercial AV scanners still don't detect them after being in the wild 
for at least two days:


https://www.virustotal.com/en/file/b2a8a2afe818469ba48a3dbafec9ce4ed49ebc0ab7ff0de68f743e4eab3fa5e1/analysis/1450775869/

In terms of ClamAV, I've had next to zero hit rates for new viruses 
arriving over email in the last few months (although it is being updated 
regularly) - so I'm starting to wonder if there is any point in using 
ClamAV for scanning emails at all.


Re: Strange behaviour by the AWL module

2015-12-13 Thread Sebastian Arcus

On 12/12/15 23:43, Benny Pedersen wrote:
On December 12, 2015 8:33:28 PM Sebastian Arcus  
wrote:



I guess I must be using the default settings - as I don't think I've
configured anything in particular for AWL


change default /16 cidr to new default /24 for ipv4, for ipv6 use /64, 
if you like to track on /32 for ipv4 then each ipv4 wil, have no 
shared awl scores


possible also change defaul awl faktory from 0.5 to 0.25 will reduce 
how much benefit from previous score


if changeing settings, delete awl db
Thank you - for the time being I've disabled the AWL module - as I've 
worked out that on my type of setup it doesn't appear to be really needed.




Re: Strange behaviour by the AWL module

2015-12-13 Thread Sebastian Arcus

On 12/12/15 19:57, John Hardin wrote:

On Sat, 12 Dec 2015, Sebastian Arcus wrote:


On 12/12/15 18:21, John Hardin wrote:

 On Sat, 12 Dec 2015, Sebastian Arcus wrote:

>  One of my servers received a spam message which SA missed, with 
the >  following report:
> >  -0.4 AWLAWL: Adjusted score from AWL 
reputation of >  From: address
> >  After learning the messages as spam into bayes with sa-learn, I 
get the >  following report:
> >  -6.1 AWLAWL: Adjusted score from AWL 
reputation of >  From: address
> > >  Luckily the message is now flagged as spam because I have 
manually >  turned up the score on my BAYES_99 and BAYES_999 awhile 
ago. But what >  intrigues me is that now the AWL module gives it a 
-6.1 score. Why would >  AWL now tilt things heavily towards ham, 
after the message has just been >  learned as spam? It seems to be 
making things worse instead of better. >  Unless I am 
misunderstanding what AWL is supposed to be doing?


 You are. The name is misleading. AWL is more a score averager than a
 whitelist. It's intended to allow for the occasionally spammy-looking
 email from a historically hammy sender (and vice versa).

 It has nothing to do with training, which only affect Bayes.

 Messages from that sender will get negative AWL scores for a while 
until

 their traffic history becomes more on the "spam" side.


OK - that's kind of what I assumed. What I don't understand is why 
the AWL score changes after the message has been learned into the 
Bayes database - and by so much?


It's not that you trained it into Bayes, but that SA had previously 
only seen email from that source address that was scored as ham. I'm 
assuming that's the first message you got from that source address? So 
their entire AWL history is 100% hammy based on the original FN.


You scan the message again, it scores as spammy now for whatever 
reason; SA checks the AWL history for that sender address and sees 
"100% hammy" and generates a partially-ofsetting negative score.


As that sender's AWL history shifts from "100% hammy" towards "99% 
spammy" (assuming you ever get mail from that address again) the 
offsetting score will head towards zero. I don't *think* AWL will 
generate positive scores for spams from a historically spammy sender 
(i.e. I think AWL is purely to offset the raw score for anomalies), so 
you should see AWL scores stop once their history is "mostly spammy".


Thank you for that explanation!




Re: Strange behaviour by the AWL module

2015-12-12 Thread Sebastian Arcus


On 12/12/15 13:06, Benny Pedersen wrote:

Sebastian Arcus skrev den 2015-12-12 12:51:


Why
would AWL now tilt things heavily towards ham, after the message has
just been learned as spam?


its how AWL works


It seems to be making things worse instead
of better. Unless I am misunderstanding what AWL is supposed to be
doing?


what are your settings for AWL plugin ?
I guess I must be using the default settings - as I don't think I've 
configured anything in particular for AWL




Re: Strange behaviour by the AWL module

2015-12-12 Thread Sebastian Arcus


On 12/12/15 18:21, John Hardin wrote:

On Sat, 12 Dec 2015, Sebastian Arcus wrote:

One of my servers received a spam message which SA missed, with the 
following report:


-0.4 AWLAWL: Adjusted score from AWL reputation 
of From: address


After learning the messages as spam into bayes with sa-learn, I get 
the following report:


-6.1 AWLAWL: Adjusted score from AWL reputation 
of From: address



Luckily the message is now flagged as spam because I have manually 
turned up the score on my BAYES_99 and BAYES_999 awhile ago. But what 
intrigues me is that now the AWL module gives it a -6.1 score. Why 
would AWL now tilt things heavily towards ham, after the message has 
just been learned as spam? It seems to be making things worse instead 
of better. Unless I am misunderstanding what AWL is supposed to be 
doing?


You are. The name is misleading. AWL is more a score averager than a 
whitelist. It's intended to allow for the occasionally spammy-looking 
email from a historically hammy sender (and vice versa).


It has nothing to do with training, which only affect Bayes.

Messages from that sender will get negative AWL scores for a while 
until their traffic history becomes more on the "spam" side.


OK - that's kind of what I assumed. What I don't understand is why the 
AWL score changes after the message has been learned into the Bayes 
database - and by so much?




Strange behaviour by the AWL module

2015-12-12 Thread Sebastian Arcus
One of my servers received a spam message which SA missed, with the 
following report:


Content analysis details:   (3.1 points, 5.0 required)

 pts rule name  description
 -- 
--
 0.0 FREEMAIL_FROM  Sender email is commonly abused enduser 
mail provider

(noreply[at]live.com)
 0.0 HTML_MESSAGE   BODY: HTML included in message
 1.5 BAYES_50   BODY: Bayes spam probability is 40 to 60%
[score: 0.4993]
 2.0 PYZOR_CHECKListed in Pyzor (http://pyzor.sf.net/)
 0.0 UNPARSEABLE_RELAY  Informational: message has unparseable 
relay lines
-0.4 AWLAWL: Adjusted score from AWL reputation of 
From: address


After learning the messages as spam into bayes with sa-learn, I get the 
following report:



Content analysis details:   (8.8 points, 5.0 required)

 pts rule name  description
 -- 
--

 4.9 BAYES_99   BODY: Bayes spam probability is 99 to 100%
[score: 1.]
 0.0 FREEMAIL_FROM  Sender email is commonly abused enduser 
mail provider

(noreply[at]live.com)
 0.0 HTML_MESSAGE   BODY: HTML included in message
 8.0 BAYES_999  BODY: Bayes spam probability is 99.9 to 100%
[score: 1.]
 2.0 PYZOR_CHECKListed in Pyzor (http://pyzor.sf.net/)
 0.0 UNPARSEABLE_RELAY  Informational: message has unparseable 
relay lines
-6.1 AWLAWL: Adjusted score from AWL reputation of 
From: address



Luckily the message is now flagged as spam because I have manually 
turned up the score on my BAYES_99 and BAYES_999 awhile ago. But what 
intrigues me is that now the AWL module gives it a -6.1 score. Why would 
AWL now tilt things heavily towards ham, after the message has just been 
learned as spam? It seems to be making things worse instead of better. 
Unless I am misunderstanding what AWL is supposed to be doing?


Re: Is it worth transferring bayes data between different sites?

2015-12-03 Thread Sebastian Arcus

On 03/12/15 01:40, Reindl Harald wrote:



Am 03.12.2015 um 01:14 schrieb Alex:

On Wed, Dec 2, 2015 at 6:34 PM, Dave Warren  wrote:

On 2015-12-02 09:14, Sebastian Arcus wrote:


Perfect - that's exactly the sort of real-life based advice I was 
looking

for. Many thanks!


I run a small shared hosting environment, with a global bayes for 
all users

as not enough users are ready/willing/able to take the time to sort ham
(although more will press "this is spam") and in general, the 
results work

out well enough.


A portion of the bayes database is the header information from the
email. What does it mean for those headers that contain info specific
to a particular domain or site when it's transferred to another domain
or site where those specifics will be different?


see attached php/formail-script and list of ignored/stripped headers

we strip a large portion of headers including especially the Received 
headers with "formail" and preprend a egenric one on top from all 
samples before train them
Does that mean that transferring  bayes databases between sites without 
stripping the headers wouldn't work - or it is just more effective if 
one strips the headers?




Re: Is it worth transferring bayes data between different sites?

2015-12-03 Thread Sebastian Arcus


On 03/12/15 00:29, Charles Sprickman wrote:

Reindl Harald  wrote:



Am 02.12.2015 um 21:50 schrieb Charles Sprickman:

Reindl Harald  wrote:


Am 02.12.2015 um 12:51 schrieb Sebastian Arcus:

I hope I'm not exceeding the patience of the list by posting a third
question in two days :-)

I realise the above question is a "soft" question, probably without a
definite "yes" or "no" answer. I am hoping that people with experience
of using SA in various environments might be able to throw in some
opinions. Based on the documentation, it is clearly possible to transfer
a bayes database from one install to another - specially if it is a
sitewide database. What I was wondering is if it is worth doing so from
a results point of view

we use our global bayes on the incoming MX and share it with our submission 
servers to stop outgoing spam from hacked accounts

This is a bit OT, but I have had a hard time finding how to setup a global 
bayes DB rather than having everything done on a per-user basis.  Looking 
around the SA wiki, I don’t see global DBs addressed.  Any tips?

https://wiki.apache.org/spamassassin/SiteWideBayesSetup

in case you are runnign spamass-milter that's even the logical default because 
your milter is running as it's own user, with it's own .spamassassin directory 
in the userhome which contains the db

I had a look at that page - I use mysql to store the data, have multiple spamd 
boxes, and spamc on the inbound servers passing mail to spamd once all the 
“front door” checks are done.  In that config, I end up with unique per-user 
bayes tokens.  I’m looking to just pool everyone together, but don’t see an 
obvious way to do that.  It seems like folks in this thread are however doing 
that somehow (perhaps just because they are using a milter or similar).
In case in helps: I use SA with exim - and Exim talks over Unix sockets 
to spamd daemon. I've used the instructions at the wiki page above to 
setup the sitewide bayes database - but I don't use MySQL - and it all 
seems to work as expected.




Re: Is it worth transferring bayes data between different sites?

2015-12-02 Thread Sebastian Arcus

On 02/12/15 12:55, Reindl Harald wrote:



Am 02.12.2015 um 12:51 schrieb Sebastian Arcus:

I hope I'm not exceeding the patience of the list by posting a third
question in two days :-)

I realise the above question is a "soft" question, probably without a
definite "yes" or "no" answer. I am hoping that people with experience
of using SA in various environments might be able to throw in some
opinions. Based on the documentation, it is clearly possible to transfer
a bayes database from one install to another - specially if it is a
sitewide database. What I was wondering is if it is worth doing so from
a results point of view


we use our global bayes on the incoming MX and share it with our 
submission servers to stop outgoing spam from hacked accounts


additionally we share our bayes with another company which pulls the 
dumps if the hash file is different every 30 minutes


we as well as the other company does mail hosting on ISP level and the 
results on both sides are perfect - we share even scorings, 
whitelists, custom body/subject-rules and the summary is: at least in 
the same country sharing spamfilter configurations works like a charme


Perfect - that's exactly the sort of real-life based advice I was 
looking for. Many thanks!




Re: Detecting which shortcircuit rule fires

2015-12-02 Thread Sebastian Arcus

On 02/12/15 12:56, Reindl Harald wrote:



Am 02.12.2015 um 12:29 schrieb Sebastian Arcus:

On 02/12/15 09:49, Reindl Harald wrote:


Am 02.12.2015 um 10:30 schrieb Sebastian Arcus:

After properly configuring a bayes database and training it following
the great advice from this list, I am now having this problem where 
some

spam is not detected properly due to a shortcircuit rule. However, I'm
having some difficulty figuring out which one of them is causing the
problem. Here is the X-Spam-Report - which should cause the email 
to be

classed as spam, really:

But when the message goes through Exim


* show the headers of such a message
* there must be a rulename which was triggered for SC
* SC is *not* enabled by default
* SC is even not loaded by default

i can't even respond quoting your headers

Thank you. What I've done:

1. I've disabled shortcircuiting in local.pre. This is not a busy
server, and on reflection it is not needed at all.
2. The shortcircuit does get disabled - although I thought it doesn't.
It turns out that I should have pointed "spamassassin -D" to the proper
site config files with the "--siteconfigpath". The mail daemon process
(spamd) does pick on the proper config files fine.

I can still post the headers of the email, if you think it would help
the list - but without shortcircuiting things look ok now


well i wonder which shortcircuit rules are active by default

i needed even "loadplugin Mail::SpamAssassin::Plugin::Shortcircuit" in 
"local.cf" on Fedora to get my own rules wokring at all
You are right - I'm on Slackware, and even here the shortcircuit plugin 
is not enabled by default. I must have enabled it myself in a misguided 
bout of enthusiasm





Is it worth transferring bayes data between different sites?

2015-12-02 Thread Sebastian Arcus
I hope I'm not exceeding the patience of the list by posting a third 
question in two days :-)


I realise the above question is a "soft" question, probably without a 
definite "yes" or "no" answer. I am hoping that people with experience 
of using SA in various environments might be able to throw in some 
opinions. Based on the documentation, it is clearly possible to transfer 
a bayes database from one install to another - specially if it is a 
sitewide database. What I was wondering is if it is worth doing so from 
a results point of view. For example, I now have a nicely trained bayes 
with a few thousands of my own ham, clean, hand-picked emails and a few 
hundred spam emails. Would there be a significant benefit in taking this 
data and using it to setup a fresh SA install on a client's server? Or 
the fact that my specific email usage pattern and content of the email I 
receive being different from the one of another organisation would 
render the bayes tokens/data useless - and I am better off starting from 
scratch there?


I've tried searching online for a discussion on this topic, but the only 
relevant bit I found is this unanswered post from 2004:


https://mail-archives.apache.org/mod_mbox/spamassassin-users/200406.mbox/%3c20040621150409.ga23...@publinet.it%3E


Re: Detecting which shortcircuit rule fires

2015-12-02 Thread Sebastian Arcus

On 02/12/15 09:49, Reindl Harald wrote:


Am 02.12.2015 um 10:30 schrieb Sebastian Arcus:

After properly configuring a bayes database and training it following
the great advice from this list, I am now having this problem where some
spam is not detected properly due to a shortcircuit rule. However, I'm
having some difficulty figuring out which one of them is causing the
problem. Here is the X-Spam-Report - which should cause the email to be
classed as spam, really:

But when the message goes through Exim


* show the headers of such a message
* there must be a rulename which was triggered for SC
* SC is *not* enabled by default
* SC is even not loaded by default

i can't even respond quoting your headers

Thank you. What I've done:

1. I've disabled shortcircuiting in local.pre. This is not a busy 
server, and on reflection it is not needed at all.
2. The shortcircuit does get disabled - although I thought it doesn't. 
It turns out that I should have pointed "spamassassin -D" to the proper 
site config files with the "--siteconfigpath". The mail daemon process 
(spamd) does pick on the proper config files fine.


I can still post the headers of the email, if you think it would help 
the list - but without shortcircuiting things look ok now.


Many thanks




Detecting which shortcircuit rule fires

2015-12-02 Thread Sebastian Arcus
After properly configuring a bayes database and training it following 
the great advice from this list, I am now having this problem where some 
spam is not detected properly due to a shortcircuit rule. However, I'm 
having some difficulty figuring out which one of them is causing the 
problem. Here is the X-Spam-Report - which should cause the email to be 
classed as spam, really:


X-Spam-Report:
*  1.2 URIBL_JP_SURBL Contains an URL listed in the JP SURBL blocklist
*  [URIs: completercpt3040.com]
*  1.4 RCVD_IN_BRBL_LASTEXT RBL: No description available.
*  [192.74.251.225 listed in bb.barracudacentral.org]
*  0.1 URIBL_SBL_A Contains URL's A record listed in the SBL blocklist
*  [URIs: completercpt3040.com]
*  1.6 URIBL_SBL Contains an URL's NS IP listed in the SBL blocklist
*  [URIs: completercpt3040.com]
*  0.0 HTML_MESSAGE BODY: HTML included in message
*  0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60%
*  [score: 0.4954]
*  0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
*  0.0 LOTS_OF_MONEY Huge... sums of money
*  0.0 UNPARSEABLE_RELAY Informational: message has unparseable 
relay lines

*  0.8 RDNS_NONE Delivered to internal network by a host with no rDNS
*  2.0 ADVANCE_FEE_2_NEW_MONEY Advance Fee fraud and lots of money


But when the message goes through Exim, it comes out with -99 score. If 
I run through the debug output, it becomes clear that a shortcircuit 
rule fires - but it doesn't say which (see the last line of output - 
clearly one of the rules is whitelisting the message). I can't even 
disable shortcircuiting - although I've commented it out in v320.pre, 
the debug output shows it is still being used:


# spamassassin -D < FWD\ Attention\ Domain\ yourdomain.com\ Notice.eml 
2>&1 | grep -i shortcircuit
Dec  2 09:11:28.896 [16962] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::Shortcircuit from @INC
Dec  2 09:11:30.282 [16962] dbg: config: fixed relative path: 
/var/lib/spamassassin/3.004001/updates_spamassassin_org/60_shortcircuit.cf
Dec  2 09:11:30.282 [16962] dbg: config: using 
"/var/lib/spamassassin/3.004001/updates_spamassassin_org/60_shortcircuit.cf" 
for included file
Dec  2 09:11:30.283 [16962] dbg: config: read file 
/var/lib/spamassassin/3.004001/updates_spamassassin_org/60_shortcircuit.cf
Dec  2 09:11:34.192 [16962] dbg: plugin: 
Mail::SpamAssassin::Plugin::Shortcircuit=HASH(0x3a0cbf8) implements 
'parsed_metadata', priority 0
Dec  2 09:11:34.394 [16962] dbg: plugin: 
Mail::SpamAssassin::Plugin::Shortcircuit=HASH(0x3a0cbf8) implements 
'have_shortcircuited', priority 0
Dec  2 09:11:34.533 [16962] dbg: plugin: 
Mail::SpamAssassin::Plugin::Shortcircuit=HASH(0x3a0cbf8) implements 
'hit_rule', priority 0

shortcircuit=no autolearn=no autolearn_force=no version=3.4.1
 -100 SHORTCIRCUIT   Not all rules were run, due to a 
shortcircuited rule




Re: How to tell if DnsBlocklists are definitely being used by my Spamassassin setup

2015-12-02 Thread Sebastian Arcus

On 01/12/15 18:59, RW wrote:

On Mon, 30 Nov 2015 20:45:25 +
Sebastian Arcus wrote:


After
setting up a site-wide bayes database as per the wiki instructions
and fixing file permissions etc., and feeding it about 300 spam
messages (I don't get a lot of spam in general) and 12,000 ham
messages of my own hand sorted email, the score for the same sample
spam message I mentioned in my original post jumped from 1.4 to
104.5 !!

I had no idea that bayes filtering can have such a dramatic effect on
a message with only a small amount of text in it!

It doesn't. At most it adds 3.7 points with default scores. You've
probably picked-up more points from net tests, but it's still a huge
change.
It seems to me that the huge score comes from a shortcircuit rule - 
which adds or subtracts, depending on email, 100 points. However, this 
is now also causing problems with proper spam not being recognised - but 
I'm going to start a different thread about it - as it is a different 
problem from my original post.




Re: How to tell if DnsBlocklists are definitely being used by my Spamassassin setup

2015-11-30 Thread Sebastian Arcus

On 30/11/15 18:01, Reindl Harald wrote:



Am 30.11.2015 um 18:30 schrieb Sebastian Arcus:

spamassassin -D  < /path/to/spam-example.eml

Thank you Harald. I did - and it looks like SA does contact lots of
DNSBL's and it receives various messages in reply. Nothing that looks
like failures or errors. I can attach the output here - but it is a lot.
Would this mean that the DNSBL's are working correctly in my setup - but
spammers somehow manage to keep on sending from "clean" domains all the
time - and I should look into some other way of stopping this type of
spam? The messages I'm talking about are typical spam, with one or two
sentences in the email body and one or two links - usually advertising
life insurance, solar panels and similar. None of them are from proper
companies or entities I have ever dealt with


you main problem is that bayes is not working because there are no 
BAYES_xx tags in your headers - collect as many as possible clear spam 
*and* clear ham samples, you need at least 200 ham samples to start 
bayes used
Thank you for that. Bayes was enabled, but looking closer at the debug 
output, it wasn't used as there weren't enough tokens/samples. Although 
I've been training it for years, it never really worked as the training 
was done as root, while the spam filtering is done as another user - 
separate databases used, wrong permissions etc.. After setting up a 
site-wide bayes database as per the wiki instructions and fixing file 
permissions etc., and feeding it about 300 spam messages (I don't get a 
lot of spam in general) and 12,000 ham messages of my own hand sorted 
email, the score for the same sample spam message I mentioned in my 
original post jumped from 1.4 to 104.5 !!


I had no idea that bayes filtering can have such a dramatic effect on a 
message with only a small amount of text in it! I will probably need to 
keep an eye on things and follow through with more tweaking and try and 
implement your other suggestions - but at the moment the difference it 
makes is dramatic. Thank you.




Re: How to tell if DnsBlocklists are definitely being used by my Spamassassin setup

2015-11-30 Thread Sebastian Arcus



On 30/11/15 16:41, Reindl Harald wrote:



Am 30.11.2015 um 17:24 schrieb Sebastian Arcus:

OK - this might be a basic question, but recently the detection rate on
my SA install has been really unreliable, so I decided that the first
step is to be sure it is using the public dns blocklists and razor. My
setup:

1. Spamassassin 3.4.1
2. I have Bind configured as recursive, non-forwarding, caching DNS 
server.

3. spamassassin --lint doesn't return any errors or failures.
5. My init.pre contains "loadplugin 
Mail::SpamAssassin::Plugin::URIDNSBL"


Here is the report included in one of the emails which is spam, but
wasn't detected as such:

Content analysis details:   (1.4 points, 5.0 required)

   pts rule name  description
   --
--
  -0.7 RCVD_IN_DNSWL_LOW  RBL: Sender listed at
http://www.dnswl.org/, low
  trust
  [212.227.15.41 listed in list.dnswl.org]
   1.0 SPF_SOFTFAIL   SPF: sender does not match SPF record
(softfail)
   0.0 HTML_MESSAGE   BODY: HTML included in message
  -0.1 DKIM_VALID Message has at least one valid DKIM or DK
signature
   0.1 DKIM_SIGNEDMessage has a DKIM or DK signature, not
necessarily valid
  -0.1 DKIM_VALID_AU  Message has a valid DKIM or DK signature
from author's
  domain
   1.3 RDNS_NONE  Delivered to internal network by a host
with no rDNS
   0.0 UNPARSEABLE_RELAY  Informational: message has unparseable
relay lines


Does the above mean that the DNSBL tests were applied, but returned zero
values - or would it mean they were skipped. I'm not sure how to find
out which one is it? I'm happy to attach some sample emails which
weren't detected, or any other useful info. Thank you


RCVD_IN_DNSWL_LOW is the opposite of "returned zero values" but why 
not just pass a sample against SA in debug-mode?


spamassassin -D  < /path/to/spam-example.eml
Thank you Harald. I did - and it looks like SA does contact lots of 
DNSBL's and it receives various messages in reply. Nothing that looks 
like failures or errors. I can attach the output here - but it is a lot. 
Would this mean that the DNSBL's are working correctly in my setup - but 
spammers somehow manage to keep on sending from "clean" domains all the 
time - and I should look into some other way of stopping this type of 
spam? The messages I'm talking about are typical spam, with one or two 
sentences in the email body and one or two links - usually advertising 
life insurance, solar panels and similar. None of them are from proper 
companies or entities I have ever dealt with.




How to tell if DnsBlocklists are definitely being used by my Spamassassin setup

2015-11-30 Thread Sebastian Arcus
OK - this might be a basic question, but recently the detection rate on 
my SA install has been really unreliable, so I decided that the first 
step is to be sure it is using the public dns blocklists and razor. My 
setup:


1. Spamassassin 3.4.1
2. I have Bind configured as recursive, non-forwarding, caching DNS server.
3. spamassassin --lint doesn't return any errors or failures.
5. My init.pre contains "loadplugin Mail::SpamAssassin::Plugin::URIDNSBL"

Here is the report included in one of the emails which is spam, but 
wasn't detected as such:


Content analysis details:   (1.4 points, 5.0 required)
 
  pts rule name  description

  -- --
 -0.7 RCVD_IN_DNSWL_LOW  RBL: Sender listed at http://www.dnswl.org/, low
 trust
 [212.227.15.41 listed in list.dnswl.org]
  1.0 SPF_SOFTFAIL   SPF: sender does not match SPF record (softfail)
  0.0 HTML_MESSAGE   BODY: HTML included in message
 -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
  0.1 DKIM_SIGNEDMessage has a DKIM or DK signature, not 
necessarily valid
 -0.1 DKIM_VALID_AU  Message has a valid DKIM or DK signature from 
author's
 domain
  1.3 RDNS_NONE  Delivered to internal network by a host with no 
rDNS
  0.0 UNPARSEABLE_RELAY  Informational: message has unparseable relay lines


Does the above mean that the DNSBL tests were applied, but returned zero values 
- or would it mean they were skipped. I'm not sure how to find out which one is 
it? I'm happy to attach some sample emails which weren't detected, or any other 
useful info. Thank you.




<    1   2