Re: Bayes not auto-learning?
On 02/24/2018 01:05 AM, Amir Caspi wrote: On Feb 23, 2018, at 11:47 PM, David B Funk wrote: It could have 20 points from a whole bunch of body rules but if it only hit 2 points via header rules it still will not auto-learn. Gotcha. The spam in question that triggered this hit a lot of rules, but hard for me to tell on cursory inspection whether it satisfies sufficient header and body points. But it LOOKS like there should be at least 3 points from header (MISSING_HEADERS, FREEMAIL_FORGED_REPLYTO, among others) and certainly 3 body (MONEY_FRAUD_3 at the very least). The actual spam report is this: * 0.0 FSL_CTYPE_WIN1251 Content-Type only seen in 419 spam * 0.0 NSL_RCVD_FROM_USER Received from User * 1.0 MISSING_HEADERS Missing To: header * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5004] * 1.1 DCC_CHECK Detected as bulk mail by DCC (dcc-servers.net) * 0.0 FROM_MISSP_MSFT From misspaced + supposed Microsoft tool * 0.0 FSL_NEW_HELO_USER Spam's using Helo and User * 2.6 MSOE_MID_WRONG_CASE No description available. * 0.0 FROM_MISSP_USER From misspaced, from "User" * 1.0 RDNS_DYNAMIC Delivered to internal network by host with * dynamic-looking rDNS * 0.0 LOTS_OF_MONEY Huge... sums of money * 0.0 FROM_MISSP_XPRIO Misspaced FROM + X-Priority * 1.6 REPLYTO_WITHOUT_TO_CC No description available. * 0.0 AXB_XMAILER_MIMEOLE_OL_024C2 Yet another X header trait * 0.0 MSGID_FROM_MTA_HEADER Message-Id was added by a relay * 0.0 FSL_BULK_SIG Bulk signature with no Unsubscribe * 2.1 FREEMAIL_FORGED_REPLYTO Freemail in Reply-To, but not From * 1.0 FREEMAIL_REPLYTO Reply-To/From or Reply-To/body contain different * freemails * 0.0 TO_NO_BRKTS_FROM_MSSP Multiple header formatting problems * 1.9 FORGED_MUA_OUTLOOK Forged mail pretending to be from MS Outlook * 1.6 TO_NO_BRKTS_DYNIP To: lacks brackets and dynamic rDNS * 0.0 FILL_THIS_FORM Fill in a form with personal information * 2.0 TO_NO_BRKTS_MSFT To: lacks brackets and supposed Microsoft tool * 2.0 FILL_THIS_FORM_LONG Fill in a form with personal information * 3.1 FROM_MISSP_FREEMAIL From misspaced + freemail provider * 3.0 MONEY_FRAUD_3 Lots of money and several fraud phrases But, it still didn't autolearn. (I can post the entire spample if the above seems like it should have autolearned.) Another possible factor, if you have "bayes_auto_learn_on_error" enabled, then autolearn will be skipped if Bayes already agrees with the condition of the message. IE: if the message is already classifed as BAYES_99 then it won't bother auto-learning it as yet another high-ranking spam. I do not have that enabled. Also, as you can see from above, this hit BAYES_50. Does the above provide an indication as to why it didn't autolearn? Thanks! --- Amir I found the best thing to do is setup a hidden mail server (iRedMail) and split a copy of all mail to it to sort and filter into a Ham and Spam folder based on rule hits and scoring. Then I run a nightly sa-learn on the Ham and Spam folders (in that order). The few questionable emails that score in the middle stay in the Inbox so I just have to drag-n-drop into the Ham or Spam folder taking a few minutes a day. Some that are new phishing campaigns or are from compromised accounts are copied into a Spamcop folder that automatically submits it to my Spamcop account. I also use the Ham and Spam folders for the nightly SA masscheck to help get new rules validated and new 72_scores.cf update daily via sa-update. -- David Jones
Re: Bayes not auto-learning?
On 2/24/2018 2:05 AM, Amir Caspi wrote: Does the above provide an indication as to why it didn't autolearn? No, the above does not help as the autolearning is complicated. I believe a few years ago I added debug output or headers or something that tried to make it clearer. If it doesn't autolearn, I would not stress. It's not a simplistic, black or white decision based on a single factor. Off-hand, I can't find the work I did but $status->get_autolearn_points() might help you dig into the code. Regards, KAM
Re: Bayes not auto-learning?
On Feb 23, 2018, at 11:47 PM, David B Funk wrote: > It could have 20 points from a whole bunch of body rules but if it only hit 2 > points via header rules it still will not auto-learn. Gotcha. The spam in question that triggered this hit a lot of rules, but hard for me to tell on cursory inspection whether it satisfies sufficient header and body points. But it LOOKS like there should be at least 3 points from header (MISSING_HEADERS, FREEMAIL_FORGED_REPLYTO, among others) and certainly 3 body (MONEY_FRAUD_3 at the very least). The actual spam report is this: * 0.0 FSL_CTYPE_WIN1251 Content-Type only seen in 419 spam * 0.0 NSL_RCVD_FROM_USER Received from User * 1.0 MISSING_HEADERS Missing To: header * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5004] * 1.1 DCC_CHECK Detected as bulk mail by DCC (dcc-servers.net) * 0.0 FROM_MISSP_MSFT From misspaced + supposed Microsoft tool * 0.0 FSL_NEW_HELO_USER Spam's using Helo and User * 2.6 MSOE_MID_WRONG_CASE No description available. * 0.0 FROM_MISSP_USER From misspaced, from "User" * 1.0 RDNS_DYNAMIC Delivered to internal network by host with * dynamic-looking rDNS * 0.0 LOTS_OF_MONEY Huge... sums of money * 0.0 FROM_MISSP_XPRIO Misspaced FROM + X-Priority * 1.6 REPLYTO_WITHOUT_TO_CC No description available. * 0.0 AXB_XMAILER_MIMEOLE_OL_024C2 Yet another X header trait * 0.0 MSGID_FROM_MTA_HEADER Message-Id was added by a relay * 0.0 FSL_BULK_SIG Bulk signature with no Unsubscribe * 2.1 FREEMAIL_FORGED_REPLYTO Freemail in Reply-To, but not From * 1.0 FREEMAIL_REPLYTO Reply-To/From or Reply-To/body contain different * freemails * 0.0 TO_NO_BRKTS_FROM_MSSP Multiple header formatting problems * 1.9 FORGED_MUA_OUTLOOK Forged mail pretending to be from MS Outlook * 1.6 TO_NO_BRKTS_DYNIP To: lacks brackets and dynamic rDNS * 0.0 FILL_THIS_FORM Fill in a form with personal information * 2.0 TO_NO_BRKTS_MSFT To: lacks brackets and supposed Microsoft tool * 2.0 FILL_THIS_FORM_LONG Fill in a form with personal information * 3.1 FROM_MISSP_FREEMAIL From misspaced + freemail provider * 3.0 MONEY_FRAUD_3 Lots of money and several fraud phrases But, it still didn't autolearn. (I can post the entire spample if the above seems like it should have autolearned.) > Another possible factor, if you have "bayes_auto_learn_on_error" enabled, > then autolearn will be skipped if Bayes already agrees with the condition of > the message. IE: if the message is already classifed as BAYES_99 then it > won't bother auto-learning it as yet another high-ranking spam. I do not have that enabled. Also, as you can see from above, this hit BAYES_50. Does the above provide an indication as to why it didn't autolearn? Thanks! --- Amir
Re: Bayes not auto-learning?
On 2018-02-23 22:32, Amir Caspi wrote: > So, I've been trying to tweak my setup and noticed that VERY few of my > emails are being autolearned as spam, even when their spam threshold > is far above the autolearn threshold. The threshold is set to 12; I > just saw a spam with score >25 not being autolearned. Sigh. This really is a FAQ, and I did ask it myself (maybe more than once). Read the fine documentation. Shortned: the score that is compared to the threshold for autolearning is _not_ the normal score that determines spam/ham. Despite the fact that is is documented, I find the algorithm to be too opaque to feel in control. -- Please don't Cc: me privately on mailing lists and Usenet, if you also post the followup to the list or newsgroup. To reply privately _only_ on Usenet and on broken lists which rewrite From, fetch the TXT record for no-use.mooo.com.
Re: Bayes not auto-learning?
On Fri, 23 Feb 2018, Amir Caspi wrote: Hi all, So, I've been trying to tweak my setup and noticed that VERY few of my emails are being autolearned as spam, even when their spam threshold is far above the autolearn threshold. The threshold is set to 12; I just saw a spam with score >25 not being autolearned. Are there rules that prevent autolearning? If so, why? If a spam scores really high because it hits (let's say) 10 or more rules, but just one of those rules is enough to prevent autolearning, that seems overly restrictive, no? For example, for one of my users, out of about 650 spams received in the last month, only 10 have been autolearned. For another user, only 12 of nearly 1400. That seems like a very low percentage, and clearly some high-scoring spams are not being auto-learned. Any explanation is appreciated! Thanks! --- Amir If you read the spamassassin documentation about Bayes auto-learning you will see that there are several conditions that must be satisfied. For example, there are some types of rules which aren't considered at all when computing the auto-learning threshold score (such as white/black list scores or rules tagged with the noautolearn tflag or the actual Bayes score itself). Of the types of rules which are allowed, at least 3 of those points must come from header type rules and at least 3 of those points must come from body type rules. So a spam can have 100 points from a blacklist and not auto-learn. It could have 20 points from a whole bunch of body rules but if it only hit 2 points via header rules it still will not auto-learn. Another possible factor, if you have "bayes_auto_learn_on_error" enabled, then autolearn will be skipped if Bayes already agrees with the condition of the message. IE: if the message is already classifed as BAYES_99 then it won't bother auto-learning it as yet another high-ranking spam. What I usually see in auto-learned spam is that they hit a number of network RBL rules (spamhaus, SORBS, etc) and a number of body rules such as RAZOR, URIBLS, etc. -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{
Bayes not auto-learning?
Hi all, So, I've been trying to tweak my setup and noticed that VERY few of my emails are being autolearned as spam, even when their spam threshold is far above the autolearn threshold. The threshold is set to 12; I just saw a spam with score >25 not being autolearned. Are there rules that prevent autolearning? If so, why? If a spam scores really high because it hits (let's say) 10 or more rules, but just one of those rules is enough to prevent autolearning, that seems overly restrictive, no? For example, for one of my users, out of about 650 spams received in the last month, only 10 have been autolearned. For another user, only 12 of nearly 1400. That seems like a very low percentage, and clearly some high-scoring spams are not being auto-learned. Any explanation is appreciated! Thanks! --- Amir