Re: score=19.9 points, tflags=autolearn_force; = autolearn=no autolearn_force=no; WTF?

2015-04-22 Thread Kevin A. McGrail

On 4/21/2015 11:48 PM, David B Funk wrote:

I've got some home-grown rules that I trust to which have added
tflags autolearn_force

Recently I've seen some spam that hit those rules and racked up enough
points that they should have auto-learned. But the scoring analysis
explicitly says autolearn=no autolearn_force=no.

What's going on here?
Different rules are categorized differently and you likely aren't 
hitting the requirements:


The score threshold above which a mail has to score, to be fed into
SpamAssassin's learning systems automatically as a spam message.

Note: SpamAssassin requires at least 3 points from the header, and 3
points from the body to auto-learn as spam.  Therefore, the minimum
working value for this option is 6.

If the test option autolearn_force is set, the minimum value will
remain at 6 points but there is no requirement that the points come
from body and header rules.  This option is useful for autolearning
with rules that are considered to be extremely safe indicators of
the spaminess of a message.


is the autolearn_force being ignored because of the initial BAYES_00
score? Is there a 'autolearn_force_yes_I_really_mean_it' tflag that
can be used to overcome that inhibition?



I'd run with debug and look for these debugs:

 dbg(learn: auto-learn: autolearn_force flagged for a rule. 
Removing seperate body and head point threshold.  Body Only Points: 
$body_only_points ($required_body_points req'd) / Head Only Points: 
$head_only_points ($required_head_points req'd));
  dbg(learn: auto-learn: autolearn_force flagged because of 
rule(s): $force_autolearn_names);

} else {
  dbg(learn: auto-learn: autolearn_force not flagged for a rule. 
Body Only Points: $body_only_points ($required_body_points req'd) / Head 
Only Points: $head_only_points ($required_head_points req'd));

}

regards,
KAM


Re: score=19.9 points, tflags=autolearn_force; = autolearn=no autolearn_force=no; WTF?

2015-04-22 Thread RW
On Tue, 21 Apr 2015 22:48:46 -0500 (CDT)
David B Funk wrote:

 
 is the autolearn_force being ignored because of the initial BAYES_00
 score? 

Yes, a Bayes point in the opposite direction prevents auto-training.
All the force flag does is override the 3+3 rule. 

 Is there a 'autolearn_force_yes_I_really_mean_it' tflag that
 can be used to overcome that inhibition?

Not as such, but it is possible to get that behaviour by transferring
the score of BAYES_00 into two mutually exclusive meta-rules, one marked
learn, and the other noautolearn. The former will retain the
sanity-check and the latter wont.


score=19.9 points, tflags=autolearn_force; = autolearn=no autolearn_force=no; WTF?

2015-04-21 Thread David B Funk

I've got some home-grown rules that I trust to which have added
tflags autolearn_force

Recently I've seen some spam that hit those rules and racked up enough
points that they should have auto-learned. But the scoring analysis
explicitly says autolearn=no autolearn_force=no.

What's going on here?

  # spamc -R  /tmp/food-0
  19.9/6.0
  Checker-Version SpamAssassin 3.4.0 (2014-02-07) on xyzzy.engr.uiowa.edu
  Content analysis details:   (19.9 points, 6.0 required, autolearn=no 
autolearn_force=no)

   pts rule name  description
   -- --
10 SURBL_URI_DBF4 Contains an URL in My SURBL list 4
  [URIs: zxrich.com]
   4.0 SURBL_URI_DBF2 Contains an URL in My SURBL list 2
  [URIs: zxrich.com]
  -0.0 RCVD_IN_MSPIKE_H4  RBL: Very Good reputation (+4)
  [178.23.244.208 listed in wl.mailspike.net]
   0.0 MISSING_HEADERSMissing To: header
  -0.0 SPF_HELO_PASS  SPF: HELO matches SPF record
  -1.9 BAYES_00   BODY: Bayes spam probability is 0 to 1%
  [score: 0.]
   1.1 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
  [cf: 100]
   1.9 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level
  above 50%
  [cf: 100]
   0.9 RAZOR2_CHECK   Listed in Razor2 (http://razor.sf.net/)
   2.0 KAM_OBFObfuscated Porn Spams
   0.8 KAM_ASCII_DIVIDERS Spam that uses ascii formatting tricks
  -0.0 RCVD_IN_MSPIKE_WL  Mailspike good senders
   1.0 TO_CC_NONE No To: or Cc: header
  -0.0 T__RECEIVED_2  More than one untrusted relay
   0.1 KHOP_SC_CIDR8  Relay CIDR /8 is among worst in SpamCop

The odd thing is that if I manually explicitly learn them with
sa-learn --spam --mbox /tmp/food-0 then suddenly the 'autolearn_force=yes' 
takes effect.
(with no other change, exact same message, seconds later).

  # spamc -R  /tmp/food-0
  23.8/6.0
  Checker-Version SpamAssassin 3.4.0 (2014-02-07) on xyzzy.engr.uiowa.edu
  Content analysis details:   (23.8 points, 6.0 required, autolearn=unavailable 
autolearn_force=yes (SURBL_URI_DBF4))

   pts rule name  description
   -- --
10 SURBL_URI_DBF4 Contains an URL in My SURBL list 4
  [URIs: zxrich.com]
   4.0 SURBL_URI_DBF2 Contains an URL in My SURBL list 2
  [URIs: zxrich.com]
   0.0 MISSING_HEADERSMissing To: header
  -0.0 SPF_HELO_PASS  SPF: HELO matches SPF record
  -0.0 RCVD_IN_MSPIKE_H4  RBL: Very Good reputation (+4)
  [178.23.244.208 listed in wl.mailspike.net]
   2.0 BAYES_80   BODY: Bayes spam probability is 80 to 95%
  [score: 0.9197]
   1.1 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
  [cf: 100]
   1.9 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level
  above 50%
  [cf: 100]
   0.9 RAZOR2_CHECK   Listed in Razor2 (http://razor.sf.net/)
   2.0 KAM_OBFObfuscated Porn Spams
   0.8 KAM_ASCII_DIVIDERS Spam that uses ascii formatting tricks
  -0.0 RCVD_IN_MSPIKE_WL  Mailspike good senders
   1.0 TO_CC_NONE No To: or Cc: header
  -0.0 T__RECEIVED_2  More than one untrusted relay
   0.1 KHOP_SC_CIDR8  Relay CIDR /8 is among worst in SpamCop

is the autolearn_force being ignored because of the initial BAYES_00
score? Is there a 'autolearn_force_yes_I_really_mean_it' tflag that
can be used to overcome that inhibition?

--
Dave Funk  University of Iowa
dbfunk (at) engineering.uiowa.eduCollege of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include std_disclaimer.h
Better is not better, 'standard' is better. B{


Re: autolearn_force

2014-05-26 Thread Karsten Bräckelmann
I suggest to spend some time reading the relevant documentation. In
particular the M::SA::Conf and AutoLearnThreshold docs.

  http://spamassassin.apache.org/doc/


On Sat, 2014-05-24 at 22:12 -0700, Ian Zimmerman wrote:
  So, now I am really confused.  I think I did everything right in
  user_prefs:

  tflags INVALID_DATE autolearn_force

tflags are part of the Privileged Settings (see section in M::SA::Conf
docs). For security and efficiency reasons, these are not allowed in
user_prefs, unless allow_user_rules is enabled.

(Just for completeness, dunno if you enabled it.)


  Nonetheless:
  
  X-Spam-Score: 6.9
  X-Spam-Tests: BAYES_99=3.5,BAYES_999=0.2,HTML_FONT_LOW_CONTRAST=0.001,
   HTML_MESSAGE=0.001,MIME_HTML_ONLY=0.723,RDNS_NONE=0.793,SPF_PASS=-0.001,
   T_REMOTE_IMAGE=0.01,URIBL_BLACK=1.7
  X-Spam-Autolearn: no autolearn_force=no

As RW already pointed out quoting the AutoLearnThreshold man page, the
score taken for the decision to auto-learn is not the same as the
overall score shown above. To prevent Bayes self-feeding,  (a) the
Bayesian rules themselves are ignored, and  (b) the respective non-Bayes
score-set is used.

The latter often (but not necessarily) results in higher scores per
rule. However 6.9 -3,7 for BAYES_* rules is likely to not exceed the
threshold even when using the respective non-Bayes score-set.


 And here's a case where it doesn't autolearn ham (same user_prefs as above):
 
 X-Spam-Status: No
 X-Spam-Level: 
 X-Spam-Score: -2.7
 X-Spam-Tests: 
 BAYES_00=-1.9,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,
  FREEMAIL_FORGED_FROMDOMAIN=0.001,FREEMAIL_FROM=0.001,
  
 HEADER_FROM_DIFFERENT_DOMAINS=0.001,HTML_MESSAGE=0.001,RCVD_IN_DNSWL_LOW=-0.7,
  RCVD_IN_MSPIKE_H2=-0.001,SPF_PASS=-0.001
 X-Spam-Autolearn: no autolearn_force=no
 
 The documentation certainly doesn't say anything like the 3/3 and force
 mechanism is in place for ham.  So this _should_ autolearn.  Right?  Right??

No. At no point does the documentation suggest autolearn_force would
work with ham. The AutoLearnThreshold doc mentions this option only in
the context of the spam threshold, and the M::SA::Conf doc is even more
clear about it.

  autolearn_force
  The test will be subject to less stringent autolearn thresholds.

  Normally, SpamAssassin will require 3 points from the header and
  3 points from the body to be auto-learned as spam. This option
  keeps the threshold at 6 points total but changes it to have no
  regard to the source of the points.


Moreover, that message did *not* trigger any of the rules you set tflags
autolearn_force for. Thus, even regardless of the actual score and it
being not spam, that message would never be considered autolearn_force.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: autolearn_force

2014-05-25 Thread RW
On Sat, 24 May 2014 22:12:10 -0700
Ian Zimmerman wrote:

  So, now I am really confused.  I think I did everything right in
  user_prefs:
   ...
  Nonetheless:
  
  X-Spam-Score: 6.9
  X-Spam-Tests:
  BAYES_99=3.5,BAYES_999=0.2,HTML_FONT_LOW_CONTRAST=0.001,
  HTML_MESSAGE=0.001,MIME_HTML_ONLY=0.723,RDNS_NONE=0.793,SPF_PASS=-0.001,
  T_REMOTE_IMAGE=0.01,URIBL_BLACK=1.7 X-Spam-Autolearn: no
  autolearn_force=no
 
 And here's a case where it doesn't autolearn ham (same user_prefs as
 above):
 
...
 The documentation certainly doesn't say anything like the 3/3 and
 force mechanism is in place for ham.  So this _should_ autolearn.
 Right?  Right??

 Mail::SpamAssassin::PlUser:CoMail::SpamAssassin::Plugin::AutoLearnThreshold(3)



NAME
   Mail::SpamAssassin::Plugin::AutoLearnThreshold - threshold-based
   discriminator for Bayes auto-learning

SYNOPSIS
 loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold

DESCRIPTION
   This plugin implements the threshold-based auto-learning
discriminator for SpamAssassin's Bayes subsystem.  Auto-learning is a
mechanism whereby high-scoring mails (or low-scoring mails, for
non-spam) are fed into its learning systems without user intervention,
during scanning.

   Note that certain tests are ignored when determining whether a
   message should be trained upon:

   o   rules with tflags set to 'learn' (the Bayesian rules)

   o   rules with tflags set to 'userconf' (user configuration)

   o   rules with tflags set to 'noautolearn'

   Also note that auto-learning occurs using scores from either
   scoreset 0 or 1, depending on what scoreset is used during
   message check.  It is likely that the message check and
   auto-learn scores will be different.



Re: autolearn_force

2014-05-25 Thread Kevin A. McGrail

On 5/25/2014 1:12 AM, Ian Zimmerman wrote:

So, now I am really confused.  I think I did everything right in
user_prefs:

bayes_auto_learn1
bayes_auto_learn_threshold_nonspam -2.00
bayes_auto_learn_threshold_spam 6.00
bayes_auto_learn_on_error 0

[snip]

tflags URIBL_DBL_SPAM autolearn_force
tflags URIBL_JP_SURBL autolearn_force
tflags URIBL_BLACK autolearn_force
tflags INVALID_DATE autolearn_force

Nonetheless:

X-Spam-Score: 6.9
X-Spam-Tests: BAYES_99=3.5,BAYES_999=0.2,HTML_FONT_LOW_CONTRAST=0.001,
  HTML_MESSAGE=0.001,MIME_HTML_ONLY=0.723,RDNS_NONE=0.793,SPF_PASS=-0.001,
  T_REMOTE_IMAGE=0.01,URIBL_BLACK=1.7
X-Spam-Autolearn: no autolearn_force=no

And here's a case where it doesn't autolearn ham (same user_prefs as above):

X-Spam-Status: No
X-Spam-Level:
X-Spam-Score: -2.7
X-Spam-Tests: BAYES_00=-1.9,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,
  FREEMAIL_FORGED_FROMDOMAIN=0.001,FREEMAIL_FROM=0.001,
  HEADER_FROM_DIFFERENT_DOMAINS=0.001,HTML_MESSAGE=0.001,RCVD_IN_DNSWL_LOW=-0.7,
  RCVD_IN_MSPIKE_H2=-0.001,SPF_PASS=-0.001
X-Spam-Autolearn: no autolearn_force=no

The documentation certainly doesn't say anything like the 3/3 and force
mechanism is in place for ham.  So this _should_ autolearn.  Right?  Right??


Hi Ian,

Perhaps a bug.  Hard to say from this little output.

Please turn on -D and pastebin the output.  If you also want to pastebin 
the email, I'll look at it.


But if not, these are the current debug's I'll be looking for:

  dbg(learn: auto-learn: autolearn_force flagged for a rule. 
Removing seperate body and head point threshold.  Body Only Points: 
$body_only_points ($required_body_points req'd) / Head Only Points: 
$head_only_points ($required_head_points req'd));
  dbg(learn: auto-learn: autolearn_force flagged because of 
rule(s): $force_autolearn_names);

} else {
  dbg(learn: auto-learn: autolearn_force not flagged for a rule. 
Body Only Points: $body_only_points ($required_body_points req'd) / Head 
Only Points: $head_only_points ($required_head_points req'd));

}

Regards,
KAM


Re: autolearn_force

2014-05-25 Thread Axb

On 05/25/2014 07:12 AM, Ian Zimmerman wrote:

tflags URIBL_DBL_SPAM autolearn_force
tflags URIBL_JP_SURBL autolearn_force
tflags URIBL_BLACK autolearn_force
tflags INVALID_DATE autolearn_force


URIBL rules are not set to use 'userconf' (user configuration)

so entries in user_prefs shouldn't affect the results

if anything it should go in a system wide rule (ie: local.cf)
(not user_prefs)

your:
tflags URIBL_DBL_SPAM autolearn_force

should probably read:

tflags  URIBL_DBL_SPAM   net domains_only autolearn_force

etc, etc - and not in user_

iirc, this will also influence Bayes's scoring/learning behaviour.

modifying rules' tflags should be done with care


Re: autolearn_force

2014-05-25 Thread Ian Zimmerman
On Sun, 25 May 2014 16:40:44 +0200
Axb axb.li...@gmail.com wrote:

Axb URIBL rules are not set to use 'userconf' (user configuration)
Axb so entries in user_prefs shouldn't affect the results

Axb if anything it should go in a system wide rule (ie: local.cf) (not
Axb user_prefs)

Axb your: tflags URIBL_DBL_SPAM autolearn_force

Axb should probably read:

Axb tflags URIBL_DBL_SPAM net domains_only autolearn_force

Axb etc, etc - and not in user_

Axb iirc, this will also influence Bayes's scoring/learning behaviour.
Axb modifying rules' tflags should be done with care

But it does autolearn in _some_ instances:

May 25 08:33:50 host spamd[13561]: spamd: result: Y 10 -
BAYES_99,BAYES_999,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE,MIME_HTML_ONLY,
RDNS_NONE,SPF_PASS,T_REMOTE_IMAGE,URIBL_BLACK,URIBL_DBL_SPAM,URIBL_JP_SURBL
scantime=1.7,size=6496,user=itz,uid=1000,required_score=4.3,rhost=127.0.0.1,
raddr=127.0.0.1,rport=52900,
mid=24251386609892242521126914206...@lun5bim.dollazo.eu,bayes=1.00,
autolearn=spam autolearn_force=yes (URIBL_JP_SURBL,URIBL_DBL_SPAM,URIBL_BLACK)

So I'm afraid I can't be satisfied with this explanation.

The whole autolearning settings thing just feels way unpredictable for
me.  If there are so many hurdles, does anyone actually do it?

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: autolearn_force

2014-05-25 Thread Axb

On 05/25/2014 05:59 PM, Ian Zimmerman wrote:

On Sun, 25 May 2014 16:40:44 +0200
Axb axb.li...@gmail.com wrote:

Axb URIBL rules are not set to use 'userconf' (user configuration)
Axb so entries in user_prefs shouldn't affect the results

Axb if anything it should go in a system wide rule (ie: local.cf) (not
Axb user_prefs)

Axb your: tflags URIBL_DBL_SPAM autolearn_force

Axb should probably read:

Axb tflags URIBL_DBL_SPAM net domains_only autolearn_force

Axb etc, etc - and not in user_

Axb iirc, this will also influence Bayes's scoring/learning behaviour.
Axb modifying rules' tflags should be done with care

But it does autolearn in _some_ instances:

May 25 08:33:50 host spamd[13561]: spamd: result: Y 10 -
BAYES_99,BAYES_999,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE,MIME_HTML_ONLY,
RDNS_NONE,SPF_PASS,T_REMOTE_IMAGE,URIBL_BLACK,URIBL_DBL_SPAM,URIBL_JP_SURBL
scantime=1.7,size=6496,user=itz,uid=1000,required_score=4.3,rhost=127.0.0.1,
raddr=127.0.0.1,rport=52900,
mid=24251386609892242521126914206...@lun5bim.dollazo.eu,bayes=1.00,
autolearn=spam autolearn_force=yes (URIBL_JP_SURBL,URIBL_DBL_SPAM,URIBL_BLACK)


Yes, when it reached certain conditions and a score above 15.0

you can tune that score via local.cf entries:

bayes_auto_learn_threshold_nonspam
bayes_auto_learn_threshold_spam

Depending on your traffic, you may want to raise/lower those scores.
There's default safe settings but playing with those score helps tune 
learning sensitivity.



The whole autolearning settings thing just feels way unpredictable for
me.


It feels unpredictable because of the overwhelming amount of variables 
which influence learning. Just stick to experimenting with settings till 
you find the best performance for your traffic.
There is no one size fits all because each system's ham/spam traffic 
can be so different.



If there are so many hurdles, does anyone actually do it?


Since Bayes was added to SA, I've used nothing else.
(2004? 2005?).







Re: autolearn_force

2014-05-25 Thread Ian Zimmerman
On Sun, 25 May 2014 20:06:22 +0200
Axb axb.li...@gmail.com wrote:

Axb Yes, when it reached certain conditions and a score above 15.0

Axb you can tune that score via local.cf entries:

Axb bayes_auto_learn_threshold_nonspam bayes_auto_learn_threshold_spam

Please see the prefs in my post upthread - I have already done this.
That's why I am so confused, and frankly, irritated.  I have done
everything the documentation says to do, and it still behaves magically
and strangely.

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: autolearn_force

2014-05-25 Thread RW
On Sun, 25 May 2014 08:59:28 -0700
Ian Zimmerman wrote:

 On Sun, 25 May 2014 16:40:44 +0200
 Axb axb.li...@gmail.com wrote:
 
 Axb URIBL rules are not set to use 'userconf' (user configuration)
 Axb so entries in user_prefs shouldn't affect the results
 
 Axb if anything it should go in a system wide rule (ie: local.cf)
 Axb (not user_prefs)
 
 Axb your: tflags URIBL_DBL_SPAM autolearn_force
 
 Axb should probably read:
 
 Axb tflags URIBL_DBL_SPAM net domains_only autolearn_force
 
 Axb etc, etc - and not in user_
 
 Axb iirc, this will also influence Bayes's scoring/learning
 Axb behaviour. modifying rules' tflags should be done with care
 
 But it does autolearn in _some_ instances:
 
 May 25 08:33:50 host spamd[13561]: spamd: result: Y 10 -
 BAYES_99,BAYES_999,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE,MIME_HTML_ONLY,
 RDNS_NONE,SPF_PASS,T_REMOTE_IMAGE,URIBL_BLACK,URIBL_DBL_SPAM,URIBL_JP_SURBL
 scantime=1.7,size=6496,user=itz,uid=1000,required_score=4.3,rhost=127.0.0.1,
 raddr=127.0.0.1,rport=52900,
 mid=24251386609892242521126914206...@lun5bim.dollazo.eu,bayes=1.00,
 autolearn=spam autolearn_force=yes
 (URIBL_JP_SURBL,URIBL_DBL_SPAM,URIBL_BLACK)
 

A difference between this and the other one you quoted is that this one
appears to be over the 6 point threshold and the other didn't. (I
haven't done the exact arithmetic for scoreset 1, but the other was only
slightly over 6 in scoreset 3 including BAYES_99, and this one is well
over). That would mean that even if autolearn_force worked correctly, it
still wouldn't have been autolearned.

It would be interesting to see if you can reproduce the previous
autolearn_force=no result on a very high scoring spam - It's possible
there may be a cosmetic bug where autolearn_force is not logged
correctly when the spam isn't going to be autolearned anyway. 


Re: autolearn_force

2014-05-24 Thread Ian Zimmerman
On Thu, 22 May 2014 15:54:42 +0100
RW rwmailli...@googlemail.com wrote:

Ian But in fact this is a per-test setting, a subcategory of tflags.
Ian Do I have to specify it separately for every test?  Why?

RW The point is to set it for a small number of rules that are
RW sufficiently strong as to guarantee there will be no mislearning in
RW combination with the autolearn as spam threshold.

So, now I am really confused.  I think I did everything right in user_prefs:

bayes_auto_learn1
bayes_auto_learn_threshold_nonspam -2.00
bayes_auto_learn_threshold_spam 6.00
bayes_auto_learn_on_error 0

[snip]

tflags URIBL_DBL_SPAM autolearn_force
tflags URIBL_JP_SURBL autolearn_force
tflags URIBL_BLACK autolearn_force
tflags INVALID_DATE autolearn_force

Nonetheless:

X-Spam-Score: 6.9
X-Spam-Tests: BAYES_99=3.5,BAYES_999=0.2,HTML_FONT_LOW_CONTRAST=0.001,
 HTML_MESSAGE=0.001,MIME_HTML_ONLY=0.723,RDNS_NONE=0.793,SPF_PASS=-0.001,
 T_REMOTE_IMAGE=0.01,URIBL_BLACK=1.7
X-Spam-Autolearn: no autolearn_force=no



One suspect thing I see in the log:

May 24 20:29:58 host spamd[13561]: spamd: result: Y 6 - 
BAYES_99,BAYES_999,HTML_FONT_LOW_CONTRAST,HTM
L_MESSAGE,MIME_HTML_ONLY,RDNS_NONE,SPF_PASS,T_REMOTE_IMAGE,URIBL_BLACK 
scantime=1.9,size=6208,user=itz,
uid=1000,required_score=4.3,rhost=127.0.0.1,raddr=127.0.0.1,rport=60231,mid=23931386609892239320827813
806...@86adv5n4.disabilism.eu,bayes=1.00,autolearn=no autolearn_force=no

Note the 6 - is it possible that SA truncates the score to an integer
for this purpose, and then treats it as a strict lower bound - that is,
if I set bayes_auto_learn_threshold_spam = 6.00, the lowest score
to actually trigger autolearn would be 7?

That is the only rational explanation I have, tortured as it is.

It sure looks like SA is going out of its way to force me to do manual
training :-(

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: autolearn_force

2014-05-24 Thread Ian Zimmerman
 So, now I am really confused.  I think I did everything right in
 user_prefs:
 
 bayes_auto_learn  1
 bayes_auto_learn_threshold_nonspam -2.00
 bayes_auto_learn_threshold_spam 6.00
 bayes_auto_learn_on_error 0
 
 [snip]
 
 tflags URIBL_DBL_SPAM autolearn_force
 tflags URIBL_JP_SURBL autolearn_force
 tflags URIBL_BLACK autolearn_force
 tflags INVALID_DATE autolearn_force
 
 Nonetheless:
 
 X-Spam-Score: 6.9
 X-Spam-Tests: BAYES_99=3.5,BAYES_999=0.2,HTML_FONT_LOW_CONTRAST=0.001,
  HTML_MESSAGE=0.001,MIME_HTML_ONLY=0.723,RDNS_NONE=0.793,SPF_PASS=-0.001,
  T_REMOTE_IMAGE=0.01,URIBL_BLACK=1.7
 X-Spam-Autolearn: no autolearn_force=no

And here's a case where it doesn't autolearn ham (same user_prefs as above):

X-Spam-Status: No
X-Spam-Level: 
X-Spam-Score: -2.7
X-Spam-Tests: BAYES_00=-1.9,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,
 FREEMAIL_FORGED_FROMDOMAIN=0.001,FREEMAIL_FROM=0.001,
 HEADER_FROM_DIFFERENT_DOMAINS=0.001,HTML_MESSAGE=0.001,RCVD_IN_DNSWL_LOW=-0.7,
 RCVD_IN_MSPIKE_H2=-0.001,SPF_PASS=-0.001
X-Spam-Autolearn: no autolearn_force=no

The documentation certainly doesn't say anything like the 3/3 and force
mechanism is in place for ham.  So this _should_ autolearn.  Right?  Right??

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: autolearn_force

2014-05-22 Thread RW
On Wed, 21 May 2014 21:34:23 -0700
Ian Zimmerman wrote:

 I don't understand this setting, and reading the documentation doesn't
 help.
 
 It seems it sould make bayes learn spam whenever the total score
 surpasses the value of bayes_auto_learn_threshold_spam, and not
 require 3 points from header and body each; that would make it a
 global setting similar in purpose to bayes_auto_learn_threshold_spam.
 
 But in fact this is a per-test setting, a subcategory of tflags.  Do I
 have to specify it separately for every test?  Why?

The point is to set it for a small number of rules that are
sufficiently strong as to guarantee there will be no mislearning in
combination with the autolearn as spam threshold. 


It's probably best to create a single metarule for this - something
that eliminates the possibility of mistraining through a lot
of overlapping rules. I do something similar to get more spam into my
high-scoring folder. I assign a lot of the near-certain spam rules
to different classes: BAYES, RBLs, URIBLs, relaycountry etc and then
count the number of classes.



Re: autolearn_force

2014-05-22 Thread Ian Zimmerman
On Thu, 22 May 2014 15:54:42 +0100
RW rwmailli...@googlemail.com wrote:

Ian I don't understand this setting, and reading the documentation
Ian doesn't help.

Ian It seems it should make Bayes learn spam whenever the total score
Ian surpasses the value of bayes_auto_learn_threshold_spam, and not
Ian require 3 points from header and body each; that would make it a
Ian global setting similar in purpose to
Ian bayes_auto_learn_threshold_spam.

Ian But in fact this is a per-test setting, a subcategory of tflags.
Ian Do I have to specify it separately for every test?  Why?

RW The point is to set it for a small number of rules that are
RW sufficiently strong as to guarantee there will be no mislearning in
RW combination with the autolearn as spam threshold.

RW It's probably best to create a single metarule for this - something
RW that eliminates the possibility of mistraining through a lot of
RW overlapping rules. I do something similar to get more spam into my
RW high-scoring folder. I assign a lot of the near-certain spam rules
RW to different classes: BAYES, RBLs, URIBLs, relaycountry etc and then
RW count the number of classes.

The problem I am trying to solve is that nearly all of my spam is
flagged due to body rules.  The header rules seem to be close to useless
with the latest campaigns - spammers seem to have learned enough to
avoid sending obvious stinking pieces of turd.  (The one exception is
patterns in the Message-ID, but I am afraid that will be short lived
too, and is insufficient by itself even now).

Thus, even if I set bayes_auto_learn_threshold_spam low, very few of my
spams are autolearned because of the 3/3 requirement.  The damn 3/3 is
my problem - how can I work around it?  If I have to spend an hour a day
manually training the classifier the spammers have won :-(

By the way, how are meta rules counted for this purpose?  The
documentation says nothing about that.

-- 
Please *no* private copies of mailing list or newsgroup messages.


autolearn_force

2014-05-21 Thread Ian Zimmerman
I don't understand this setting, and reading the documentation doesn't
help.

It seems it sould make bayes learn spam whenever the total score
surpasses the value of bayes_auto_learn_threshold_spam, and not require
3 points from header and body each; that would make it a global setting
similar in purpose to bayes_auto_learn_threshold_spam.

But in fact this is a per-test setting, a subcategory of tflags.  Do I
have to specify it separately for every test?  Why?

Or is there another way to bypass the 3/3 requirement?

-- 
Please *no* private copies of mailing list or newsgroup messages.