Re: Error when trying to re-use Bayes database from one server to another
On 12/02/16 19:14, Reindl Harald wrote: Am 12.02.2016 um 20:06 schrieb Marc Perkel: Any chance that the parent directory structure doesn't have enough permissions? The error message says it can't access it so there's your clue. Since the files themselves seem to have good permissions I would look at the directories. see previous mail - that was already verified looking closer "No such file or directory" is not a permission problem there was a hint "please re-run with -D" at least re-use bayes on different servers, even over different operating systems is no problem, or bayes is running on 3 own and 2 foreign machines for a long time now with great results I've checked and triple checked everything. Unless I'm missing something blindingly obvious, I don't think that error message is accurate. If I delete the bayes files and restart spamd, on running sa-learn, new ones are created in exactly the same place, with same name and same permissions - and they work fine. But the ones brought over from the other server don't work. PS - Regarding the "re-run with -D for more information" - I guess that message is slightly pointless, as it keeps on saying that even when you run it with "-D"
Re: Error when trying to re-use Bayes database from one server to another
On 12/02/16 16:59, Reindl Harald wrote: Am 12.02.2016 um 17:29 schrieb Sebastian Arcus: As per advice from this list, I have been re-using my bayes databases on several different servers running SA. On one of the servers though, the database is not accepted. I re-transferred them several times over ssh, to make sure they were not corrupted. The database files are in the correct location, with correct permissions and owned by the correct user: # ls -l /var/spool/spamd/bayes/ total 5912 -rw-rw-rw- 1 spamd spamd 1310720 2016-02-09 08:42 bayes_seen -rw-rw-rw- 1 spamd spamd 4739072 2016-02-09 08:43 bayes_toks The version of SA on both the donor and receiving servers is 3.4.1. When I try to learn a new message on the receiving server (where I moved the bayes files), I get the following error: su - spamd stat /var stat /var/spool stat /var/spool/spamd stat /var/spool/spamd/bayes Linux is not like Windows - if ou don't have access to a parent folder you just don't have access Sorry - previous reply sent in HTML format by mistake: root@mdr-server:/# su - spamd No directory, logging in with HOME=/ spamd@mdr-server:/$ stat /var File: `/var' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 900h/2304dInode: 12 Links: 16 Access: (0755/drwxr-xr-x) Uid: (0/root) Gid: (0/ root) Access: 2016-01-18 09:28:23.0 + Modify: 2016-01-18 09:22:47.0 + Change: 2016-01-18 09:28:23.744774236 + spamd@mdr-server:/$ stat /var/spool File: `/var/spool' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 900h/2304dInode: 118 Links: 22 Access: (0755/drwxr-xr-x) Uid: (0/root) Gid: (0/ root) Access: 2015-02-03 14:28:33.0 + Modify: 2015-12-03 17:41:28.859794403 + Change: 2015-12-03 17:41:28.859794403 + spamd@mdr-server:/$ stat /var/spool/spamd File: `/var/spool/spamd' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 900h/2304dInode: 15473107Links: 3 Access: (0770/drwxrwx---) Uid: ( 1037/ spamd) Gid: ( 252/ spamd) Access: 2015-12-03 17:41:28.859794403 + Modify: 2015-12-03 17:41:32.011239989 + Change: 2015-12-03 17:46:59.187806044 + spamd@mdr-server:/$ stat /var/spool/spamd/bayes/ File: `/var/spool/spamd/bayes/' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 900h/2304dInode: 15473106Links: 3 Access: (0776/drwxrwxrw-) Uid: ( 1037/ spamd) Gid: ( 252/ spamd) Access: 2015-12-03 17:41:32.011239989 + Modify: 2016-02-12 16:20:53.778709980 + Change: 2016-02-12 16:20:53.778709980 +
Re: Error when trying to re-use Bayes database from one server to another
On 12/02/16 16:59, Reindl Harald wrote: Am 12.02.2016 um 17:29 schrieb Sebastian Arcus: As per advice from this list, I have been re-using my bayes databases on several different servers running SA. On one of the servers though, the database is not accepted. I re-transferred them several times over ssh, to make sure they were not corrupted. The database files are in the correct location, with correct permissions and owned by the correct user: # ls -l /var/spool/spamd/bayes/ total 5912 -rw-rw-rw- 1 spamd spamd 1310720 2016-02-09 08:42 bayes_seen -rw-rw-rw- 1 spamd spamd 4739072 2016-02-09 08:43 bayes_toks The version of SA on both the donor and receiving servers is 3.4.1. When I try to learn a new message on the receiving server (where I moved the bayes files), I get the following error: su - spamd stat /var stat /var/spool stat /var/spool/spamd stat /var/spool/spamd/bayes Linux is not like Windows - if ou don't have access to a parent folder you just don't have access root@mdr-server:/# su - spamd No directory, logging in with HOME=/ spamd@mdr-server:/$ stat /var File: `/var' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 900h/2304dInode: 12 Links: 16 Access: (0755/drwxr-xr-x) Uid: (0/root) Gid: (0/ root) Access: 2016-01-18 09:28:23.0 + Modify: 2016-01-18 09:22:47.0 + Change: 2016-01-18 09:28:23.744774236 + spamd@mdr-server:/$ stat /var/spool File: `/var/spool' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 900h/2304dInode: 118 Links: 22 Access: (0755/drwxr-xr-x) Uid: (0/root) Gid: (0/ root) Access: 2015-02-03 14:28:33.0 + Modify: 2015-12-03 17:41:28.859794403 + Change: 2015-12-03 17:41:28.859794403 + spamd@mdr-server:/$ stat /var/spool/spamd File: `/var/spool/spamd' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 900h/2304dInode: 15473107Links: 3 Access: (0770/drwxrwx---) Uid: ( 1037/ spamd) Gid: ( 252/ spamd) Access: 2015-12-03 17:41:28.859794403 + Modify: 2015-12-03 17:41:32.011239989 + Change: 2015-12-03 17:46:59.187806044 + spamd@mdr-server:/$ stat /var/spool/spamd/bayes/ File: `/var/spool/spamd/bayes/' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 900h/2304dInode: 15473106Links: 3 Access: (0776/drwxrwxrw-) Uid: ( 1037/ spamd) Gid: ( 252/ spamd) Access: 2015-12-03 17:41:32.011239989 + Modify: 2016-02-12 16:20:53.778709980 + Change: 2016-02-12 16:20:53.778709980 +
Error when trying to re-use Bayes database from one server to another
As per advice from this list, I have been re-using my bayes databases on several different servers running SA. On one of the servers though, the database is not accepted. I re-transferred them several times over ssh, to make sure they were not corrupted. The database files are in the correct location, with correct permissions and owned by the correct user: # ls -l /var/spool/spamd/bayes/ total 5912 -rw-rw-rw- 1 spamd spamd 1310720 2016-02-09 08:42 bayes_seen -rw-rw-rw- 1 spamd spamd 4739072 2016-02-09 08:43 bayes_toks The version of SA on both the donor and receiving servers is 3.4.1. When I try to learn a new message on the receiving server (where I moved the bayes files), I get the following error: # su - spamd -c "/usr/bin/sa-learn -D --spam /New\ UnansweredSexHookup\ Request.eml" Feb 12 16:20:53.777 [12973] dbg: locker: mode is 438 Feb 12 16:20:53.778 [12973] dbg: locker: safe_lock: created /var/spool/spamd/bayes/bayes.lock.mdr-server.mdrinteriors.co.uk.12973 Feb 12 16:20:53.778 [12973] dbg: locker: safe_lock: trying to get lock on /var/spool/spamd/bayes/bayes with 0 retries Feb 12 16:20:53.778 [12973] dbg: locker: safe_lock: link to /var/spool/spamd/bayes/bayes.lock: link ok Feb 12 16:20:53.778 [12973] dbg: bayes: tie-ing to DB file R/W /var/spool/spamd/bayes/bayes_toks Feb 12 16:20:53.779 [12973] dbg: bayes: untie-ing DB file toks Feb 12 16:20:53.779 [12973] dbg: locker: safe_unlock: unlink /var/spool/spamd/bayes/bayes.lock bayes: cannot open bayes databases /var/spool/spamd/bayes/bayes_* R/W: tie failed: No such file or directory Learned tokens from 0 message(s) (1 message(s) examined) Feb 12 16:20:53.779 [12973] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x93106d0) implements 'learner_close', priority 0 ERROR: the Bayes learn function returned an error, please re-run with -D for more information at /usr/bin/sa-learn line 498.
Re: Word macros
On 22/12/15 10:07, Reindl Harald wrote: Am 22.12.2015 um 10:26 schrieb Sebastian Arcus: In terms of ClamAV, I've had next to zero hit rates for new viruses arriving over email in the last few months (although it is being updated regularly) - so I'm starting to wonder if there is any point in using ClamAV for scanning emails at all without sanesecurity signatures likely no Thanks for that - I wasn't aware of their existence. I'll give them a go
Re: Word macros
On 22/12/15 08:04, Axb wrote: On 12/21/2015 11:46 PM, Alex wrote: Hi all, For the past few days we've been hit with Word macro viruses/spam that isn't being tagged by clamav or spamassassin, and I thought someone might be able to take a look: http://pastebin.com/cAWcAbm2 This one still isn't tagged by clamav/sanesecurity. I've submitted this sample, so perhaps it is now, but I thought someone might have some ideas for a meta or something else in the message that could more generally tag these? Anyone else seeing these? I've also already added the IP to the client blocklist. You may need to add some commercial AV to your layer... https://www.virustotal.com/en/file/cbf2c9dd334e786e53958927c05ac1c3f749de21e9e0b1cb551c5b8dd3e34a56/analysis/1450770862/ quite a few to choose from... I've been seeing some of these Word docs with macros in the last few days as well. The worrying thing is that some of the (reputable) commercial AV scanners still don't detect them after being in the wild for at least two days: https://www.virustotal.com/en/file/b2a8a2afe818469ba48a3dbafec9ce4ed49ebc0ab7ff0de68f743e4eab3fa5e1/analysis/1450775869/ In terms of ClamAV, I've had next to zero hit rates for new viruses arriving over email in the last few months (although it is being updated regularly) - so I'm starting to wonder if there is any point in using ClamAV for scanning emails at all.
Re: Strange behaviour by the AWL module
On 12/12/15 23:43, Benny Pedersen wrote: On December 12, 2015 8:33:28 PM Sebastian Arcus wrote: I guess I must be using the default settings - as I don't think I've configured anything in particular for AWL change default /16 cidr to new default /24 for ipv4, for ipv6 use /64, if you like to track on /32 for ipv4 then each ipv4 wil, have no shared awl scores possible also change defaul awl faktory from 0.5 to 0.25 will reduce how much benefit from previous score if changeing settings, delete awl db Thank you - for the time being I've disabled the AWL module - as I've worked out that on my type of setup it doesn't appear to be really needed.
Re: Strange behaviour by the AWL module
On 12/12/15 19:57, John Hardin wrote: On Sat, 12 Dec 2015, Sebastian Arcus wrote: On 12/12/15 18:21, John Hardin wrote: On Sat, 12 Dec 2015, Sebastian Arcus wrote: > One of my servers received a spam message which SA missed, with the > following report: > > -0.4 AWLAWL: Adjusted score from AWL reputation of > From: address > > After learning the messages as spam into bayes with sa-learn, I get the > following report: > > -6.1 AWLAWL: Adjusted score from AWL reputation of > From: address > > > Luckily the message is now flagged as spam because I have manually > turned up the score on my BAYES_99 and BAYES_999 awhile ago. But what > intrigues me is that now the AWL module gives it a -6.1 score. Why would > AWL now tilt things heavily towards ham, after the message has just been > learned as spam? It seems to be making things worse instead of better. > Unless I am misunderstanding what AWL is supposed to be doing? You are. The name is misleading. AWL is more a score averager than a whitelist. It's intended to allow for the occasionally spammy-looking email from a historically hammy sender (and vice versa). It has nothing to do with training, which only affect Bayes. Messages from that sender will get negative AWL scores for a while until their traffic history becomes more on the "spam" side. OK - that's kind of what I assumed. What I don't understand is why the AWL score changes after the message has been learned into the Bayes database - and by so much? It's not that you trained it into Bayes, but that SA had previously only seen email from that source address that was scored as ham. I'm assuming that's the first message you got from that source address? So their entire AWL history is 100% hammy based on the original FN. You scan the message again, it scores as spammy now for whatever reason; SA checks the AWL history for that sender address and sees "100% hammy" and generates a partially-ofsetting negative score. As that sender's AWL history shifts from "100% hammy" towards "99% spammy" (assuming you ever get mail from that address again) the offsetting score will head towards zero. I don't *think* AWL will generate positive scores for spams from a historically spammy sender (i.e. I think AWL is purely to offset the raw score for anomalies), so you should see AWL scores stop once their history is "mostly spammy". Thank you for that explanation!
Re: Strange behaviour by the AWL module
On 12/12/15 13:06, Benny Pedersen wrote: Sebastian Arcus skrev den 2015-12-12 12:51: Why would AWL now tilt things heavily towards ham, after the message has just been learned as spam? its how AWL works It seems to be making things worse instead of better. Unless I am misunderstanding what AWL is supposed to be doing? what are your settings for AWL plugin ? I guess I must be using the default settings - as I don't think I've configured anything in particular for AWL
Re: Strange behaviour by the AWL module
On 12/12/15 18:21, John Hardin wrote: On Sat, 12 Dec 2015, Sebastian Arcus wrote: One of my servers received a spam message which SA missed, with the following report: -0.4 AWLAWL: Adjusted score from AWL reputation of From: address After learning the messages as spam into bayes with sa-learn, I get the following report: -6.1 AWLAWL: Adjusted score from AWL reputation of From: address Luckily the message is now flagged as spam because I have manually turned up the score on my BAYES_99 and BAYES_999 awhile ago. But what intrigues me is that now the AWL module gives it a -6.1 score. Why would AWL now tilt things heavily towards ham, after the message has just been learned as spam? It seems to be making things worse instead of better. Unless I am misunderstanding what AWL is supposed to be doing? You are. The name is misleading. AWL is more a score averager than a whitelist. It's intended to allow for the occasionally spammy-looking email from a historically hammy sender (and vice versa). It has nothing to do with training, which only affect Bayes. Messages from that sender will get negative AWL scores for a while until their traffic history becomes more on the "spam" side. OK - that's kind of what I assumed. What I don't understand is why the AWL score changes after the message has been learned into the Bayes database - and by so much?
Strange behaviour by the AWL module
One of my servers received a spam message which SA missed, with the following report: Content analysis details: (3.1 points, 5.0 required) pts rule name description -- -- 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (noreply[at]live.com) 0.0 HTML_MESSAGE BODY: HTML included in message 1.5 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4993] 2.0 PYZOR_CHECKListed in Pyzor (http://pyzor.sf.net/) 0.0 UNPARSEABLE_RELAY Informational: message has unparseable relay lines -0.4 AWLAWL: Adjusted score from AWL reputation of From: address After learning the messages as spam into bayes with sa-learn, I get the following report: Content analysis details: (8.8 points, 5.0 required) pts rule name description -- -- 4.9 BAYES_99 BODY: Bayes spam probability is 99 to 100% [score: 1.] 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (noreply[at]live.com) 0.0 HTML_MESSAGE BODY: HTML included in message 8.0 BAYES_999 BODY: Bayes spam probability is 99.9 to 100% [score: 1.] 2.0 PYZOR_CHECKListed in Pyzor (http://pyzor.sf.net/) 0.0 UNPARSEABLE_RELAY Informational: message has unparseable relay lines -6.1 AWLAWL: Adjusted score from AWL reputation of From: address Luckily the message is now flagged as spam because I have manually turned up the score on my BAYES_99 and BAYES_999 awhile ago. But what intrigues me is that now the AWL module gives it a -6.1 score. Why would AWL now tilt things heavily towards ham, after the message has just been learned as spam? It seems to be making things worse instead of better. Unless I am misunderstanding what AWL is supposed to be doing?
Re: Is it worth transferring bayes data between different sites?
On 03/12/15 01:40, Reindl Harald wrote: Am 03.12.2015 um 01:14 schrieb Alex: On Wed, Dec 2, 2015 at 6:34 PM, Dave Warren wrote: On 2015-12-02 09:14, Sebastian Arcus wrote: Perfect - that's exactly the sort of real-life based advice I was looking for. Many thanks! I run a small shared hosting environment, with a global bayes for all users as not enough users are ready/willing/able to take the time to sort ham (although more will press "this is spam") and in general, the results work out well enough. A portion of the bayes database is the header information from the email. What does it mean for those headers that contain info specific to a particular domain or site when it's transferred to another domain or site where those specifics will be different? see attached php/formail-script and list of ignored/stripped headers we strip a large portion of headers including especially the Received headers with "formail" and preprend a egenric one on top from all samples before train them Does that mean that transferring bayes databases between sites without stripping the headers wouldn't work - or it is just more effective if one strips the headers?
Re: Is it worth transferring bayes data between different sites?
On 03/12/15 00:29, Charles Sprickman wrote: Reindl Harald wrote: Am 02.12.2015 um 21:50 schrieb Charles Sprickman: Reindl Harald wrote: Am 02.12.2015 um 12:51 schrieb Sebastian Arcus: I hope I'm not exceeding the patience of the list by posting a third question in two days :-) I realise the above question is a "soft" question, probably without a definite "yes" or "no" answer. I am hoping that people with experience of using SA in various environments might be able to throw in some opinions. Based on the documentation, it is clearly possible to transfer a bayes database from one install to another - specially if it is a sitewide database. What I was wondering is if it is worth doing so from a results point of view we use our global bayes on the incoming MX and share it with our submission servers to stop outgoing spam from hacked accounts This is a bit OT, but I have had a hard time finding how to setup a global bayes DB rather than having everything done on a per-user basis. Looking around the SA wiki, I don’t see global DBs addressed. Any tips? https://wiki.apache.org/spamassassin/SiteWideBayesSetup in case you are runnign spamass-milter that's even the logical default because your milter is running as it's own user, with it's own .spamassassin directory in the userhome which contains the db I had a look at that page - I use mysql to store the data, have multiple spamd boxes, and spamc on the inbound servers passing mail to spamd once all the “front door” checks are done. In that config, I end up with unique per-user bayes tokens. I’m looking to just pool everyone together, but don’t see an obvious way to do that. It seems like folks in this thread are however doing that somehow (perhaps just because they are using a milter or similar). In case in helps: I use SA with exim - and Exim talks over Unix sockets to spamd daemon. I've used the instructions at the wiki page above to setup the sitewide bayes database - but I don't use MySQL - and it all seems to work as expected.
Re: Is it worth transferring bayes data between different sites?
On 02/12/15 12:55, Reindl Harald wrote: Am 02.12.2015 um 12:51 schrieb Sebastian Arcus: I hope I'm not exceeding the patience of the list by posting a third question in two days :-) I realise the above question is a "soft" question, probably without a definite "yes" or "no" answer. I am hoping that people with experience of using SA in various environments might be able to throw in some opinions. Based on the documentation, it is clearly possible to transfer a bayes database from one install to another - specially if it is a sitewide database. What I was wondering is if it is worth doing so from a results point of view we use our global bayes on the incoming MX and share it with our submission servers to stop outgoing spam from hacked accounts additionally we share our bayes with another company which pulls the dumps if the hash file is different every 30 minutes we as well as the other company does mail hosting on ISP level and the results on both sides are perfect - we share even scorings, whitelists, custom body/subject-rules and the summary is: at least in the same country sharing spamfilter configurations works like a charme Perfect - that's exactly the sort of real-life based advice I was looking for. Many thanks!
Re: Detecting which shortcircuit rule fires
On 02/12/15 12:56, Reindl Harald wrote: Am 02.12.2015 um 12:29 schrieb Sebastian Arcus: On 02/12/15 09:49, Reindl Harald wrote: Am 02.12.2015 um 10:30 schrieb Sebastian Arcus: After properly configuring a bayes database and training it following the great advice from this list, I am now having this problem where some spam is not detected properly due to a shortcircuit rule. However, I'm having some difficulty figuring out which one of them is causing the problem. Here is the X-Spam-Report - which should cause the email to be classed as spam, really: But when the message goes through Exim * show the headers of such a message * there must be a rulename which was triggered for SC * SC is *not* enabled by default * SC is even not loaded by default i can't even respond quoting your headers Thank you. What I've done: 1. I've disabled shortcircuiting in local.pre. This is not a busy server, and on reflection it is not needed at all. 2. The shortcircuit does get disabled - although I thought it doesn't. It turns out that I should have pointed "spamassassin -D" to the proper site config files with the "--siteconfigpath". The mail daemon process (spamd) does pick on the proper config files fine. I can still post the headers of the email, if you think it would help the list - but without shortcircuiting things look ok now well i wonder which shortcircuit rules are active by default i needed even "loadplugin Mail::SpamAssassin::Plugin::Shortcircuit" in "local.cf" on Fedora to get my own rules wokring at all You are right - I'm on Slackware, and even here the shortcircuit plugin is not enabled by default. I must have enabled it myself in a misguided bout of enthusiasm
Is it worth transferring bayes data between different sites?
I hope I'm not exceeding the patience of the list by posting a third question in two days :-) I realise the above question is a "soft" question, probably without a definite "yes" or "no" answer. I am hoping that people with experience of using SA in various environments might be able to throw in some opinions. Based on the documentation, it is clearly possible to transfer a bayes database from one install to another - specially if it is a sitewide database. What I was wondering is if it is worth doing so from a results point of view. For example, I now have a nicely trained bayes with a few thousands of my own ham, clean, hand-picked emails and a few hundred spam emails. Would there be a significant benefit in taking this data and using it to setup a fresh SA install on a client's server? Or the fact that my specific email usage pattern and content of the email I receive being different from the one of another organisation would render the bayes tokens/data useless - and I am better off starting from scratch there? I've tried searching online for a discussion on this topic, but the only relevant bit I found is this unanswered post from 2004: https://mail-archives.apache.org/mod_mbox/spamassassin-users/200406.mbox/%3c20040621150409.ga23...@publinet.it%3E
Re: Detecting which shortcircuit rule fires
On 02/12/15 09:49, Reindl Harald wrote: Am 02.12.2015 um 10:30 schrieb Sebastian Arcus: After properly configuring a bayes database and training it following the great advice from this list, I am now having this problem where some spam is not detected properly due to a shortcircuit rule. However, I'm having some difficulty figuring out which one of them is causing the problem. Here is the X-Spam-Report - which should cause the email to be classed as spam, really: But when the message goes through Exim * show the headers of such a message * there must be a rulename which was triggered for SC * SC is *not* enabled by default * SC is even not loaded by default i can't even respond quoting your headers Thank you. What I've done: 1. I've disabled shortcircuiting in local.pre. This is not a busy server, and on reflection it is not needed at all. 2. The shortcircuit does get disabled - although I thought it doesn't. It turns out that I should have pointed "spamassassin -D" to the proper site config files with the "--siteconfigpath". The mail daemon process (spamd) does pick on the proper config files fine. I can still post the headers of the email, if you think it would help the list - but without shortcircuiting things look ok now. Many thanks
Detecting which shortcircuit rule fires
After properly configuring a bayes database and training it following the great advice from this list, I am now having this problem where some spam is not detected properly due to a shortcircuit rule. However, I'm having some difficulty figuring out which one of them is causing the problem. Here is the X-Spam-Report - which should cause the email to be classed as spam, really: X-Spam-Report: * 1.2 URIBL_JP_SURBL Contains an URL listed in the JP SURBL blocklist * [URIs: completercpt3040.com] * 1.4 RCVD_IN_BRBL_LASTEXT RBL: No description available. * [192.74.251.225 listed in bb.barracudacentral.org] * 0.1 URIBL_SBL_A Contains URL's A record listed in the SBL blocklist * [URIs: completercpt3040.com] * 1.6 URIBL_SBL Contains an URL's NS IP listed in the SBL blocklist * [URIs: completercpt3040.com] * 0.0 HTML_MESSAGE BODY: HTML included in message * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.4954] * 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts * 0.0 LOTS_OF_MONEY Huge... sums of money * 0.0 UNPARSEABLE_RELAY Informational: message has unparseable relay lines * 0.8 RDNS_NONE Delivered to internal network by a host with no rDNS * 2.0 ADVANCE_FEE_2_NEW_MONEY Advance Fee fraud and lots of money But when the message goes through Exim, it comes out with -99 score. If I run through the debug output, it becomes clear that a shortcircuit rule fires - but it doesn't say which (see the last line of output - clearly one of the rules is whitelisting the message). I can't even disable shortcircuiting - although I've commented it out in v320.pre, the debug output shows it is still being used: # spamassassin -D < FWD\ Attention\ Domain\ yourdomain.com\ Notice.eml 2>&1 | grep -i shortcircuit Dec 2 09:11:28.896 [16962] dbg: plugin: loading Mail::SpamAssassin::Plugin::Shortcircuit from @INC Dec 2 09:11:30.282 [16962] dbg: config: fixed relative path: /var/lib/spamassassin/3.004001/updates_spamassassin_org/60_shortcircuit.cf Dec 2 09:11:30.282 [16962] dbg: config: using "/var/lib/spamassassin/3.004001/updates_spamassassin_org/60_shortcircuit.cf" for included file Dec 2 09:11:30.283 [16962] dbg: config: read file /var/lib/spamassassin/3.004001/updates_spamassassin_org/60_shortcircuit.cf Dec 2 09:11:34.192 [16962] dbg: plugin: Mail::SpamAssassin::Plugin::Shortcircuit=HASH(0x3a0cbf8) implements 'parsed_metadata', priority 0 Dec 2 09:11:34.394 [16962] dbg: plugin: Mail::SpamAssassin::Plugin::Shortcircuit=HASH(0x3a0cbf8) implements 'have_shortcircuited', priority 0 Dec 2 09:11:34.533 [16962] dbg: plugin: Mail::SpamAssassin::Plugin::Shortcircuit=HASH(0x3a0cbf8) implements 'hit_rule', priority 0 shortcircuit=no autolearn=no autolearn_force=no version=3.4.1 -100 SHORTCIRCUIT Not all rules were run, due to a shortcircuited rule
Re: How to tell if DnsBlocklists are definitely being used by my Spamassassin setup
On 01/12/15 18:59, RW wrote: On Mon, 30 Nov 2015 20:45:25 + Sebastian Arcus wrote: After setting up a site-wide bayes database as per the wiki instructions and fixing file permissions etc., and feeding it about 300 spam messages (I don't get a lot of spam in general) and 12,000 ham messages of my own hand sorted email, the score for the same sample spam message I mentioned in my original post jumped from 1.4 to 104.5 !! I had no idea that bayes filtering can have such a dramatic effect on a message with only a small amount of text in it! It doesn't. At most it adds 3.7 points with default scores. You've probably picked-up more points from net tests, but it's still a huge change. It seems to me that the huge score comes from a shortcircuit rule - which adds or subtracts, depending on email, 100 points. However, this is now also causing problems with proper spam not being recognised - but I'm going to start a different thread about it - as it is a different problem from my original post.
Re: How to tell if DnsBlocklists are definitely being used by my Spamassassin setup
On 30/11/15 18:01, Reindl Harald wrote: Am 30.11.2015 um 18:30 schrieb Sebastian Arcus: spamassassin -D < /path/to/spam-example.eml Thank you Harald. I did - and it looks like SA does contact lots of DNSBL's and it receives various messages in reply. Nothing that looks like failures or errors. I can attach the output here - but it is a lot. Would this mean that the DNSBL's are working correctly in my setup - but spammers somehow manage to keep on sending from "clean" domains all the time - and I should look into some other way of stopping this type of spam? The messages I'm talking about are typical spam, with one or two sentences in the email body and one or two links - usually advertising life insurance, solar panels and similar. None of them are from proper companies or entities I have ever dealt with you main problem is that bayes is not working because there are no BAYES_xx tags in your headers - collect as many as possible clear spam *and* clear ham samples, you need at least 200 ham samples to start bayes used Thank you for that. Bayes was enabled, but looking closer at the debug output, it wasn't used as there weren't enough tokens/samples. Although I've been training it for years, it never really worked as the training was done as root, while the spam filtering is done as another user - separate databases used, wrong permissions etc.. After setting up a site-wide bayes database as per the wiki instructions and fixing file permissions etc., and feeding it about 300 spam messages (I don't get a lot of spam in general) and 12,000 ham messages of my own hand sorted email, the score for the same sample spam message I mentioned in my original post jumped from 1.4 to 104.5 !! I had no idea that bayes filtering can have such a dramatic effect on a message with only a small amount of text in it! I will probably need to keep an eye on things and follow through with more tweaking and try and implement your other suggestions - but at the moment the difference it makes is dramatic. Thank you.
Re: How to tell if DnsBlocklists are definitely being used by my Spamassassin setup
On 30/11/15 16:41, Reindl Harald wrote: Am 30.11.2015 um 17:24 schrieb Sebastian Arcus: OK - this might be a basic question, but recently the detection rate on my SA install has been really unreliable, so I decided that the first step is to be sure it is using the public dns blocklists and razor. My setup: 1. Spamassassin 3.4.1 2. I have Bind configured as recursive, non-forwarding, caching DNS server. 3. spamassassin --lint doesn't return any errors or failures. 5. My init.pre contains "loadplugin Mail::SpamAssassin::Plugin::URIDNSBL" Here is the report included in one of the emails which is spam, but wasn't detected as such: Content analysis details: (1.4 points, 5.0 required) pts rule name description -- -- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [212.227.15.41 listed in list.dnswl.org] 1.0 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.0 HTML_MESSAGE BODY: HTML included in message -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 0.1 DKIM_SIGNEDMessage has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 1.3 RDNS_NONE Delivered to internal network by a host with no rDNS 0.0 UNPARSEABLE_RELAY Informational: message has unparseable relay lines Does the above mean that the DNSBL tests were applied, but returned zero values - or would it mean they were skipped. I'm not sure how to find out which one is it? I'm happy to attach some sample emails which weren't detected, or any other useful info. Thank you RCVD_IN_DNSWL_LOW is the opposite of "returned zero values" but why not just pass a sample against SA in debug-mode? spamassassin -D < /path/to/spam-example.eml Thank you Harald. I did - and it looks like SA does contact lots of DNSBL's and it receives various messages in reply. Nothing that looks like failures or errors. I can attach the output here - but it is a lot. Would this mean that the DNSBL's are working correctly in my setup - but spammers somehow manage to keep on sending from "clean" domains all the time - and I should look into some other way of stopping this type of spam? The messages I'm talking about are typical spam, with one or two sentences in the email body and one or two links - usually advertising life insurance, solar panels and similar. None of them are from proper companies or entities I have ever dealt with.
How to tell if DnsBlocklists are definitely being used by my Spamassassin setup
OK - this might be a basic question, but recently the detection rate on my SA install has been really unreliable, so I decided that the first step is to be sure it is using the public dns blocklists and razor. My setup: 1. Spamassassin 3.4.1 2. I have Bind configured as recursive, non-forwarding, caching DNS server. 3. spamassassin --lint doesn't return any errors or failures. 5. My init.pre contains "loadplugin Mail::SpamAssassin::Plugin::URIDNSBL" Here is the report included in one of the emails which is spam, but wasn't detected as such: Content analysis details: (1.4 points, 5.0 required) pts rule name description -- -- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [212.227.15.41 listed in list.dnswl.org] 1.0 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.0 HTML_MESSAGE BODY: HTML included in message -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 0.1 DKIM_SIGNEDMessage has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 1.3 RDNS_NONE Delivered to internal network by a host with no rDNS 0.0 UNPARSEABLE_RELAY Informational: message has unparseable relay lines Does the above mean that the DNSBL tests were applied, but returned zero values - or would it mean they were skipped. I'm not sure how to find out which one is it? I'm happy to attach some sample emails which weren't detected, or any other useful info. Thank you.