Help with Bayes auto-learn

2005-05-13 Thread Geoff Sweet
I would like to enable the Bayes system with auto-learning.  I thought 
that I had my config setup correctly but apparently I don't.  My config 
looks like this:

##
# How we want to modify the email
rewrite_header subject [**SPAM**]
report_safe 0
#Bayes learning system
use_bayes 1
bayes_auto_learn 1
# Define the sensitivity level. Standard level is 5.
required_hits 6.8
# Enable SpamAssassin's RBL checking features :
skip_rbl_checks 0
rbl_timeout 3
num_check_received 3
score RCVD_IN_BL_SPAMCOP_NET 3
report_header 1
use_terse_report 1
##
so I thought from the reading in the FAQ and on the wiki that this would 
enable bayes, and turn on its auto_learn for spam that hits higher then 
the default of 12.  But in my logs I end up with this:

2005-05-12 23:30:33.240563500 2005-05-13 06:30:33 [88906] i: connection 
from localhost.whootis.com [127.0.0.1] at port 4737
2005-05-12 23:30:33.333094500 2005-05-13 06:30:33 [88906] i: processing 
message [EMAIL PROTECTED] for qmaild:10004.
2005-05-12 23:30:33.431814500 2005-05-13 06:30:33 [88906] i: identified 
spam (23.2/6.8) for qmaild:10004 in 0.2 seconds, 1311 bytes.
2005-05-12 23:30:33.432514500 2005-05-13 06:30:33 [88906] i: result: Y 
23 - 
BAYES_99,FORGED_MUA_THEBAT_BOUN,FORGED_THEBAT_HTML,FORGED_YAHOO_RCVD,HEAD_ILLEGAL_CHARS,HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI,MSGID_RANDY,NORMAL_HTTP_TO_IP,RCVD_BY_IP,RCVD_DOUBLE_IP_LOOSE,RCVD_HELO_IP_MISMATCH,RCVD_NUMERIC_HELO,SUBJ_ILLEGAL_CHARS 
scantime=0.2,size=1311,mid=[EMAIL PROTECTED],bayes=0.999,autolearn=no

Does the autolearn=no mean that this message has not been submitted to 
bayes for auto-learn?  And if not, can someone steer me in the right 
direction for getting my config setup correctly?

Thanks very much,
Geoff Sweet


RE: Help with Bayes auto-learn

2005-05-13 Thread George Breahna
I can swear I saw this question in at least 20 different messages, not to
mention the website

I really recommend you research your question before asking it.

autolearn=no means that it didn't 'learn' this message.

Other possible states are 'spam, 'ham' and ... 'DISABLED'

If autolearn were to be disabled, you would see this last one.





I would like to enable the Bayes system with auto-learning.  I thought that
I had my config setup correctly but apparently I don't.  My config looks
like this:

##
# How we want to modify the email
rewrite_header subject [**SPAM**]
report_safe 0

#Bayes learning system
use_bayes 1
bayes_auto_learn 1

# Define the sensitivity level. Standard level is 5.
required_hits 6.8

# Enable SpamAssassin's RBL checking features :
skip_rbl_checks 0
rbl_timeout 3
num_check_received 3
score RCVD_IN_BL_SPAMCOP_NET 3
report_header 1
use_terse_report 1
##

so I thought from the reading in the FAQ and on the wiki that this would
enable bayes, and turn on its auto_learn for spam that hits higher then the
default of 12.  But in my logs I end up with this:

2005-05-12 23:30:33.240563500 2005-05-13 06:30:33 [88906] i: connection from
localhost.whootis.com [127.0.0.1] at port 4737
2005-05-12 23:30:33.333094500 2005-05-13 06:30:33 [88906] i: processing
message [EMAIL PROTECTED] for qmaild:10004.
2005-05-12 23:30:33.431814500 2005-05-13 06:30:33 [88906] i: identified spam
(23.2/6.8) for qmaild:10004 in 0.2 seconds, 1311 bytes.
2005-05-12 23:30:33.432514500 2005-05-13 06:30:33 [88906] i: result: Y
23 -
BAYES_99,FORGED_MUA_THEBAT_BOUN,FORGED_THEBAT_HTML,FORGED_YAHOO_RCVD,HEAD_IL
LEGAL_CHARS,HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY,MIME_HTML_ONLY
_MULTI,MSGID_RANDY,NORMAL_HTTP_TO_IP,RCVD_BY_IP,RCVD_DOUBLE_IP_LOOSE,RCVD_HE
LO_IP_MISMATCH,RCVD_NUMERIC_HELO,SUBJ_ILLEGAL_CHARS
scantime=0.2,size=1311,mid=[EMAIL PROTECTED],bayes=0.9
99,autolearn=no

Does the autolearn=no mean that this message has not been submitted to
bayes for auto-learn?  And if not, can someone steer me in the right
direction for getting my config setup correctly?

Thanks very much,
Geoff Sweet



Re: Help with Bayes auto-learn

2005-05-13 Thread wolfgang
In an older episode (Friday 13 May 2005 08:38), Geoff Sweet wrote:
 I would like to enable the Bayes system with auto-learning.  I thought 
 that I had my config setup correctly but apparently I don't.  My config 
 looks like this:
 
 ##
 # How we want to modify the email
 rewrite_header subject [**SPAM**]
 report_safe 0
 
 #Bayes learning system
 use_bayes 1
 bayes_auto_learn 1

In an older episode (Friday 13 May 2005 10:17), George Breahna wrote:
 I really recommend you research your question before asking it.

good point, anyway:

man Mail::SpamAssassin::Conf 
and
http://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Conf.html
would tell you:

bayes_min_ham_num (Default: 200)
bayes_min_spam_num (Default: 200)
To be accurate, the Bayes system does not activate until a certain number 
of ham (non-spam) and spam have been learned. The default is 200 of each ham 
and spam, but you can tune these up or down with these two settings.

for information how to learn the needed amount of mails, see

man sa-learn

regards,

wolfgang



Re: Help with Bayes auto-learn

2005-05-13 Thread Joe Zitnik

Yes, but his scoring list BAYES_99 as one of the scores, which means bayes is active, which means it has been fed the necessary 200 spam and 200 ham. If it hadn't been fed the necessary spam and ham, it would not have been given a BAYES score at all. The fact that the mail was not autolearned could mean that it did not fall within the autolearn range OR that an identical message had already been learned. With a score like BAYES_99, it is probably the latter. wolfgang [EMAIL PROTECTED] 5/13/2005 4:38 AM 
In an older episode (Friday 13 May 2005 08:38), Geoff Sweet wrote: I would like to enable the Bayes system with auto-learning. I thought  that I had my config setup correctly but apparently I don't. My config  looks like this:  ## # How we want to modify the email rewrite_header subject [**SPAM**] report_safe 0  #Bayes learning system use_bayes 1 bayes_auto_learn 1In an older episode (Friday 13 May 2005 10:17), George Breahna wrote: I really recommend you research your question before asking it.good point, anyway:man Mail::SpamAssassin::Conf andhttp://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Conf.htmlwould tell you:bayes_min_ham_num (Default: 200)bayes_min_spam_num (Default: 200) To be accurate, the Bayes system does not activate until a certain number of ham (non-spam) and spam have been learned. The default is 200 of each ham and spam, but you can tune these up or down with these two settings.for information how to learn the needed amount of mails, seeman sa-learnregards,wolfgang


Re: Help with Bayes auto-learn

2005-05-13 Thread wolfgang
In an older episode (Friday 13 May 2005 12:26), Joe Zitnik wrote:
 Yes, but his scoring list BAYES_99 as one of the scores, which means
 bayes is active, which means it has been fed the necessary 200 spam and
 200 ham.  If it hadn't been fed the necessary spam and ham, it would not
 have been given a BAYES score at all. 

thanks for pointing that out, i had missed that.

wolfgang


Re: Help with Bayes auto-learn

2005-05-13 Thread Matt Kettler
At 02:38 AM 5/13/2005, Geoff Sweet wrote:
2005-05-12 23:30:33.432514500 2005-05-13 06:30:33 [88906] i: result: Y 23 
- 
BAYES_99,FORGED_MUA_THEBAT_BOUN,FORGED_THEBAT_HTML,FORGED_YAHOO_RCVD,HEAD_ILLEGAL_CHARS,HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI,MSGID_RANDY,NORMAL_HTTP_TO_IP,RCVD_BY_IP,RCVD_DOUBLE_IP_LOOSE,RCVD_HELO_IP_MISMATCH,RCVD_NUMERIC_HELO,SUBJ_ILLEGAL_CHARS 
scantime=0.2,size=1311,mid=[EMAIL PROTECTED],bayes=0.999,autolearn=no

Does the autolearn=no mean that this message has not been submitted to 
bayes for auto-learn?  And if not, can someone steer me in the right 
direction for getting my config setup correctly?
First, I'm assuming you're using SA 3.0.0 or higher, if not, please specify 
version and I'll correct my message (some of the details differ)

That does mean the message was not autolearned. However, it does not mean 
that no messages will be autolearned. In SA 3.0 if autolearning was 
disabled, or failing, you would have seen disabled or failed, not no.

The requirements for autolearning are considerably more complex than just 
total score over xx.

The following things have to happen:
Note: ALL scores referenced below are the learning score. Learning score is 
NOT the same as the final spam score. It is the score recalculated as if 
bayes was disabled, *including* changing scoreset. Also all AWL, whitelist, 
and blacklist rules don't count towards this score.

1) total learning score over bayes_auto_learn_threshold_spam (default 12)
2) learning score of  header rules must be over 3.0
3) learning score of  body rules must be over 3.0
4) existing bayes learning must not be strongly ham (ie: don't learn as 
spam anything that would otherwise get bayes_00'ed)
5) From addresses (including Return-Path, etc) must not match a 
bayes_ignore_from statement
6) To addresses (including Cc, etc) must not match a bayes_ignore_from 
statement
7) The bayes DB must not be locked by some other SA process (another 
learner, expiry, etc). Note: this test results in autolearn=failed.

See also:
http://wiki.apache.org/spamassassin/AutolearningNotWorking