Re: score=19.9 points, tflags=autolearn_force; = autolearn=no autolearn_force=no; WTF?

2015-04-22 Thread Kevin A. McGrail

On 4/21/2015 11:48 PM, David B Funk wrote:

I've got some home-grown rules that I trust to which have added
tflags autolearn_force

Recently I've seen some spam that hit those rules and racked up enough
points that they should have auto-learned. But the scoring analysis
explicitly says autolearn=no autolearn_force=no.

What's going on here?
Different rules are categorized differently and you likely aren't 
hitting the requirements:


The score threshold above which a mail has to score, to be fed into
SpamAssassin's learning systems automatically as a spam message.

Note: SpamAssassin requires at least 3 points from the header, and 3
points from the body to auto-learn as spam.  Therefore, the minimum
working value for this option is 6.

If the test option autolearn_force is set, the minimum value will
remain at 6 points but there is no requirement that the points come
from body and header rules.  This option is useful for autolearning
with rules that are considered to be extremely safe indicators of
the spaminess of a message.


is the autolearn_force being ignored because of the initial BAYES_00
score? Is there a 'autolearn_force_yes_I_really_mean_it' tflag that
can be used to overcome that inhibition?



I'd run with debug and look for these debugs:

 dbg(learn: auto-learn: autolearn_force flagged for a rule. 
Removing seperate body and head point threshold.  Body Only Points: 
$body_only_points ($required_body_points req'd) / Head Only Points: 
$head_only_points ($required_head_points req'd));
  dbg(learn: auto-learn: autolearn_force flagged because of 
rule(s): $force_autolearn_names);

} else {
  dbg(learn: auto-learn: autolearn_force not flagged for a rule. 
Body Only Points: $body_only_points ($required_body_points req'd) / Head 
Only Points: $head_only_points ($required_head_points req'd));

}

regards,
KAM


Re: score=19.9 points, tflags=autolearn_force; = autolearn=no autolearn_force=no; WTF?

2015-04-22 Thread RW
On Tue, 21 Apr 2015 22:48:46 -0500 (CDT)
David B Funk wrote:

 
 is the autolearn_force being ignored because of the initial BAYES_00
 score? 

Yes, a Bayes point in the opposite direction prevents auto-training.
All the force flag does is override the 3+3 rule. 

 Is there a 'autolearn_force_yes_I_really_mean_it' tflag that
 can be used to overcome that inhibition?

Not as such, but it is possible to get that behaviour by transferring
the score of BAYES_00 into two mutually exclusive meta-rules, one marked
learn, and the other noautolearn. The former will retain the
sanity-check and the latter wont.


score=19.9 points, tflags=autolearn_force; = autolearn=no autolearn_force=no; WTF?

2015-04-21 Thread David B Funk

I've got some home-grown rules that I trust to which have added
tflags autolearn_force

Recently I've seen some spam that hit those rules and racked up enough
points that they should have auto-learned. But the scoring analysis
explicitly says autolearn=no autolearn_force=no.

What's going on here?

  # spamc -R  /tmp/food-0
  19.9/6.0
  Checker-Version SpamAssassin 3.4.0 (2014-02-07) on xyzzy.engr.uiowa.edu
  Content analysis details:   (19.9 points, 6.0 required, autolearn=no 
autolearn_force=no)

   pts rule name  description
   -- --
10 SURBL_URI_DBF4 Contains an URL in My SURBL list 4
  [URIs: zxrich.com]
   4.0 SURBL_URI_DBF2 Contains an URL in My SURBL list 2
  [URIs: zxrich.com]
  -0.0 RCVD_IN_MSPIKE_H4  RBL: Very Good reputation (+4)
  [178.23.244.208 listed in wl.mailspike.net]
   0.0 MISSING_HEADERSMissing To: header
  -0.0 SPF_HELO_PASS  SPF: HELO matches SPF record
  -1.9 BAYES_00   BODY: Bayes spam probability is 0 to 1%
  [score: 0.]
   1.1 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
  [cf: 100]
   1.9 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level
  above 50%
  [cf: 100]
   0.9 RAZOR2_CHECK   Listed in Razor2 (http://razor.sf.net/)
   2.0 KAM_OBFObfuscated Porn Spams
   0.8 KAM_ASCII_DIVIDERS Spam that uses ascii formatting tricks
  -0.0 RCVD_IN_MSPIKE_WL  Mailspike good senders
   1.0 TO_CC_NONE No To: or Cc: header
  -0.0 T__RECEIVED_2  More than one untrusted relay
   0.1 KHOP_SC_CIDR8  Relay CIDR /8 is among worst in SpamCop

The odd thing is that if I manually explicitly learn them with
sa-learn --spam --mbox /tmp/food-0 then suddenly the 'autolearn_force=yes' 
takes effect.
(with no other change, exact same message, seconds later).

  # spamc -R  /tmp/food-0
  23.8/6.0
  Checker-Version SpamAssassin 3.4.0 (2014-02-07) on xyzzy.engr.uiowa.edu
  Content analysis details:   (23.8 points, 6.0 required, autolearn=unavailable 
autolearn_force=yes (SURBL_URI_DBF4))

   pts rule name  description
   -- --
10 SURBL_URI_DBF4 Contains an URL in My SURBL list 4
  [URIs: zxrich.com]
   4.0 SURBL_URI_DBF2 Contains an URL in My SURBL list 2
  [URIs: zxrich.com]
   0.0 MISSING_HEADERSMissing To: header
  -0.0 SPF_HELO_PASS  SPF: HELO matches SPF record
  -0.0 RCVD_IN_MSPIKE_H4  RBL: Very Good reputation (+4)
  [178.23.244.208 listed in wl.mailspike.net]
   2.0 BAYES_80   BODY: Bayes spam probability is 80 to 95%
  [score: 0.9197]
   1.1 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
  [cf: 100]
   1.9 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level
  above 50%
  [cf: 100]
   0.9 RAZOR2_CHECK   Listed in Razor2 (http://razor.sf.net/)
   2.0 KAM_OBFObfuscated Porn Spams
   0.8 KAM_ASCII_DIVIDERS Spam that uses ascii formatting tricks
  -0.0 RCVD_IN_MSPIKE_WL  Mailspike good senders
   1.0 TO_CC_NONE No To: or Cc: header
  -0.0 T__RECEIVED_2  More than one untrusted relay
   0.1 KHOP_SC_CIDR8  Relay CIDR /8 is among worst in SpamCop

is the autolearn_force being ignored because of the initial BAYES_00
score? Is there a 'autolearn_force_yes_I_really_mean_it' tflag that
can be used to overcome that inhibition?

--
Dave Funk  University of Iowa
dbfunk (at) engineering.uiowa.eduCollege of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include std_disclaimer.h
Better is not better, 'standard' is better. B{