Re: Help with Bayes-SQL-Configuration

2018-07-18 Thread Kris Deugau

Julian Kippels wrote:

Hi,

I am in the process of setting up a bayes-sql-database but I am unsure
of wether I want to set the bayes_sql_override_username option.
I would like to have per-user-bayes scores, so that scores from user A
will not interfere with messages sent to user B.
If I understand it correctly, no matter what I use as the username for
sa-learn, this option when set will override it with whatever I put
there. Does this not effectivly disable per-user-bayes scores and
bundles them all under one meta-user?
However I have read, when using amavis (which I do) to call SA, I
should set this variable to the username which runs the amavis process.
What should I do?


You're getting two similarly-named but unrelated options confused.

bayes_sql_override_username explicitly and specifically disables the 
per-user Bayes DB setup you're asking for.  (IMO this is of limited use 
at any scale larger than "a handful of technically minded users", but it 
*can* work as long as your users are willing to feed the system.  For 
most other setups you're going to get better results by using a single, 
centrally-maintained site-wide Bayes database.)


The option you're probably looking for is "bayes_sql_username" (and the 
related bayes_sql_dsn and bayes_sql_password).  This sets the SQL 
connection user - not the SA/Bayes user! - which is what you need if you 
want to keep many per-user Bayes datasets in an SQL database instead of 
one of the other backends.


Calling SpamAssassin from Amavis or some other glue layer that operates 
during the same part of mail flow means that it's sometimes extremely 
difficult to make use of per-user Bayes and other settings, because you 
have one message with many recipients.  Full per-user SA settings are 
IMO best handled by calling SA on final mail delivery, where you are 
guaranteed to be calling SA for exactly one recipient on any given call. 
 The downside of this method is that if a message originally had 
multiple recipients, SA will be called for each of those recipients.


-kgd


Help with Bayes-SQL-Configuration

2018-07-18 Thread Julian Kippels
Hi,

I am in the process of setting up a bayes-sql-database but I am unsure
of wether I want to set the bayes_sql_override_username option.
I would like to have per-user-bayes scores, so that scores from user A
will not interfere with messages sent to user B.
If I understand it correctly, no matter what I use as the username for
sa-learn, this option when set will override it with whatever I put
there. Does this not effectivly disable per-user-bayes scores and
bundles them all under one meta-user?
However I have read, when using amavis (which I do) to call SA, I
should set this variable to the username which runs the amavis process.
What should I do?

Thanks
Julian


Re: Help with bayes

2008-11-18 Thread Troy Settle

Kai Schaetzl wrote:

Troy Settle wrote on Mon, 17 Nov 2008 13:33:10 -0500:

I'm having a major problem with the bayes system.  I cleared the bayes 
database and let it start re-learning.  Once it kicked in, I again 
started getting false hits with BAYES_00=-2.599 on a great many spam/uce 
messages.


How did you let it start re-learning? What's the output of sa-learn dump 
magic?
From incoming mail.  I'm still working on building a corpus suitable 
for sa-learn.


$ sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  44946  0  non-token data: nspam
0.000  0  36757  0  non-token data: nham
0.000  0 545675  0  non-token data: ntokens
0.000  0 1226964376  0  non-token data: oldest atime
0.000  0 1227033356  0  non-token data: newest atime
0.000  0 1227033315  0  non-token data: last journal 
sync atime

0.000  0 1227007705  0  non-token data: last expiry atime
0.000  0  43200  0  non-token data: last expire 
atime delta
0.000  0 393274  0  non-token data: last expire 
reduction count



FWIW, how bad would I screw things up if I were to override the BAYES_00 
score to 0?



--
 Troy Settle
 Pulaski Networks ~ http://www.psknet.com
 866.477.5638 ~ 540.994.4254





Re: Help with bayes

2008-11-18 Thread Karsten Bräckelmann
On Tue, 2008-11-18 at 15:19 -0500, Troy Settle wrote:
 Kai Schaetzl wrote:
  Troy Settle wrote on Mon, 17 Nov 2008 13:33:10 -0500:
 
  I'm having a major problem with the bayes system.  I cleared the bayes 
  database and let it start re-learning.  Once it kicked in, I again 
  started getting false hits with BAYES_00=-2.599 on a great many spam/uce 
  messages.
 
  How did you let it start re-learning? What's the output of sa-learn dump 
  magic?

 From incoming mail.  I'm still working on building a corpus suitable 
 for sa-learn.

You *need* to train on error.  Also, you definitely will want to
manually learn, at the very least until Bayes has been trained properly.

If you rely solely on auto-learning, there is a great many spams that
will not be learned. Which pretty much are exactly those where Bayes can
make a difference!

  
http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Conf.html#learning_options
  
http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html

By default, auto-learning will *not* learn any spam with a total score
less than 12 (without Bayes, etc) and body and header tests less than 3
respectively. It won't learn ham with a score above 0.1 either. This is
a safety measure.


 FWIW, how bad would I screw things up if I were to override the BAYES_00 
 score to 0?

That's not gonna solve your problems. You'd better properly train Bayes
on the stuff not auto-learned, so it will eventually learn the
difference between ham and spam. So far it only knows about the extreme
ends, which really don't need Bayes to make a difference anyway.

  guenther


-- 
char *t=[EMAIL PROTECTED];
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Help with bayes

2008-11-18 Thread Jeff Mincy
   From: Troy Settle [EMAIL PROTECTED]
   Date: Tue, 18 Nov 2008 15:19:56 -0500
   
   Kai Schaetzl wrote:
Troy Settle wrote on Mon, 17 Nov 2008 13:33:10 -0500:
   
I'm having a major problem with the bayes system.  I cleared the bayes 
database and let it start re-learning.  Once it kicked in, I again 
started getting false hits with BAYES_00=-2.599 on a great many spam/uce 
messages.

How many and what percentage spam messages are getting BAYES_00?
A few spam messages getting BAYES_00/05/20 is ok.  If you are getting
a large percentage of spam hitting BAYES_00  then you have
some sort of problem with the messages that are being learned.  Most
likely you are (auto)learning spam messages as ham.  Any mistakes made
in learning need to be corrected by relearning those messages.  Any
spam message that has autolearn=ham has to be relearned as spam.
Or perhaps you are not learning from enough spam messages.

For spam messages getting BAYES_00 what do you get for the following:
 spamassassin -D --test-mode --debug all,bayes  msg.txt 21 | grep bayes:
Which spammy looking tokens have low values?

How did you let it start re-learning? What's the output of sa-learn dump 
magic?
From incoming mail.  I'm still working on building a corpus suitable 
   for sa-learn.
   
   $ sa-learn --dump magic
   0.000  0  3  0  non-token data: bayes db version
   0.000  0  44946  0  non-token data: nspam
   0.000  0  36757  0  non-token data: nham
   0.000  0 545675  0  non-token data: ntokens
...

You should probably increase the size of the Bayes database, eg
 bayes_expiry_max_db_size 200
   
   FWIW, how bad would I screw things up if I were to override the BAYES_00 
   score to 0?

With proper training this should not be necessary.  Also, 0 would
disable the test, so you won't get any BAYES_00 hits.  A small
temporary non zero score would be better so you can continue to
track the problem.

-jeff


Re: Help with bayes

2008-11-18 Thread Kai Schaetzl
Troy Settle wrote on Tue, 18 Nov 2008 15:19:56 -0500:

 From incoming mail.

well, but how? By auto-learning? In that case you are just multiplying your 
problem. It seems a lot of spam gets miscategorized as ham. Auto-learning 
that spam as ham means enforcing this miscategorization and that's what you 
see as a result.

 0.000  0  44946  0  non-token data: nspam
 0.000  0  36757  0  non-token data: nham
 0.000  0 545675  0  non-token data: ntokens

looking fine if the ham tokens were really ham.

 0.000  0 1227007705  0  non-token data: last expiry atime

 0.000  0 393274  0  non-token data: last expire 
 reduction count

Hm, you just did an expire that slashed your db almost in half? You may 
want to let it grow a bit.

 
 FWIW, how bad would I screw things up if I were to override the BAYES_00 
 score to 0?

As it is causing you grief now, probably not much. It means that real ham 
that also gets detected as Bayes_00 will not enjoy the benefits of this 
negative score. Maybe switching Bayes off for a while is better.

I would start over with that db.

1. stop Bayes and check how the categorization without Bayes works, by 
theory you should have a good number of miscategorized spam (as ham) 
already without Bayes.

2. collect some ham and spam where you can be absolutely sure that they are 
in the right category and then train Bayes with these. Stop autolearning 
for bayes for a while.

3. switch it on with your new db and check if Bayes seems to categorize 
better now

4. if it does then switch auto-learning on, but move the auto-learning 
threshold for ham a bit down, so that the chance of spam creeping in is 
smaller.


Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com





Re: Help with bayes

2008-11-18 Thread James Wilkinson
Kai Schaetzl wrote:
 well, but how? By auto-learning? In that case you are just multiplying your 
 problem. It seems a lot of spam gets miscategorized as ham. Auto-learning 
 that spam as ham means enforcing this miscategorization and that's what you 
 see as a result.

When SpamAssassin decides whether or not to learn a message, it does not
take Bayes scores into account.

So if you have a message that only hits BAYES_00 (with a score of either
-2.3 or -2.6) and another rule with a score of 0.2, that message will
not be learnt (unless you change the limits), because 0.2 is greater
than 0.1 (the limit).

Hope this helps,

James.

-- 
E-mail: james@ | ‘Sir, they’ve taken Mr. Rimmer!’
aprilcottage.co.uk | ‘Quick, let’s get out of here before they bring him
   | back!’
   | -- Kryten and Cat, ‘Red Dwarf’


Re: Help with bayes

2008-11-18 Thread Kai Schaetzl
James Wilkinson wrote on Tue, 18 Nov 2008 21:56:34 +:

  well, but how? By auto-learning? In that case you are just multiplying your 
  problem. It seems a lot of spam gets miscategorized as ham. Auto-learning 
  that spam as ham means enforcing this miscategorization and that's what you 
  see as a result.
 
 When SpamAssassin decides whether or not to learn a message, it does not
 take Bayes scores into account.
 
 So if you have a message that only hits BAYES_00 (with a score of either
 -2.3 or -2.6) and another rule with a score of 0.2, that message will
 not be learnt (unless you change the limits), because 0.2 is greater
 than 0.1 (the limit).

Very well, but doesn't affect anything of what I wrote. ;-) I think you 
misunderstood my explanation.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com





Help with bayes

2008-11-17 Thread Troy Settle
I'm having a major problem with the bayes system.  I cleared the bayes 
database and let it start re-learning.  Once it kicked in, I again 
started getting false hits with BAYES_00=-2.599 on a great many spam/uce 
messages.


Can someone point me to some good reading material to better understand 
why this is happening, and how to prevent it?


SA is running under a single user site-wide (about 2500 mailboxes 
total).  Is this screwing things up for me?  Would I have better results 
if I were to run SA for each user separately?


Thanks,

--
 Troy Settle
 Pulaski Networks
 866.477.5638
 



Re: Help with bayes

2008-11-17 Thread Kai Schaetzl
Troy Settle wrote on Mon, 17 Nov 2008 13:33:10 -0500:

 I'm having a major problem with the bayes system.  I cleared the bayes 
 database and let it start re-learning.  Once it kicked in, I again 
 started getting false hits with BAYES_00=-2.599 on a great many spam/uce 
 messages.

How did you let it start re-learning? What's the output of sa-learn dump 
magic?

 SA is running under a single user site-wide (about 2500 mailboxes 
 total).  Is this screwing things up for me?  Would I have better results 
 if I were to run SA for each user separately?

If your users each get enough mail to produce enough Bayes tokens, maybe.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com





Help with BAYES + MYSQL

2006-03-02 Thread sinofzik

Hoppe someone can help me!!

Iam using Spamassassin 3.0.1 
( users stored in mysql and vpopmail and qmail).
Slackware version 10.2
Mysql version 5.0

My problem:
when i use the standart bayes confs ( hard drive .db files) everything  
works so fine..

When i change to bayes MYSQL. strange things happens.
First time i receive a email, system give me a message that not have  
user at bayes tables. but at end he use the learn mode and everything  
is ok .
After this when this same user start to receive email the system dye  
in a Segmentation fault message ( look debug example 1 ).

And after a --lit debug example..

any idea what happens?


##
ebug example 1
###
Thu Mar  2 14:01:24 2006 [17152] dbg: prefork: ordered 17156 to accept
Thu Mar  2 14:01:24 2006 [17152] dbg: prefork: sysread(6) not ready,  
wait max 300 secs
Thu Mar  2 14:01:24 2006 [17156] info: spamd: connection from  
localhost [127.0.0.1] at port 48847

Thu Mar  2 14:01:24 2006 [17152] dbg: prefork: child 17156: entering state 2
Thu Mar  2 14:01:24 2006 [17152] dbg: prefork: new lowest idle kid: 17157
Thu Mar  2 14:01:24 2006 [17156] info: spamd: handle_user unable to  
find user: [EMAIL PROTECTED]
Thu Mar  2 14:01:24 2006 [17156] dbg: config: Conf::SQL: executing  
SQL: select preference, value  from userpref where username =  
'[EMAIL PROTECTED]' or username = '@GLOBAL' order by username asc
Thu Mar  2 14:01:24 2006 [17156] dbg: config: retrieving prefs for  
[EMAIL PROTECTED] from SQL server

Thu Mar  2 14:01:24 2006 [17156] dbg: info: user has changed
Thu Mar  2 14:01:24 2006 [17156] dbg: bayes: using username:  
[EMAIL PROTECTED]

Thu Mar  2 14:01:24 2006 [17156] dbg: bayes: database connection established
Thu Mar  2 14:01:24 2006 [17156] dbg: bayes: found bayes db version 3
Thu Mar  2 14:01:24 2006 [17156] dbg: bayes: Using userid: 4
Thu Mar  2 14:01:24 2006 [17156] dbg: bayes: not available for  
scanning, only 0 spam(s) in bayes DB  10

Thu Mar  2 14:01:24 2006 [17156] dbg: config: score set 1 chosen.
Thu Mar  2 14:01:24 2006 [17156] dbg: dns: name server:  
192.168.100.105, LocalAddr: 0.0.0.0
Thu Mar  2 14:01:24 2006 [17156] info: spamd: processing message  
[EMAIL PROTECTED] for  
[EMAIL PROTECTED]:0

Thu Mar  2 14:01:24 2006 [17156] dbg: bayes: database connection established
Thu Mar  2 14:01:24 2006 [17156] dbg: bayes: found bayes db version 3
Thu Mar  2 14:01:24 2006 [17156] dbg: bayes: Using userid: 4
Thu Mar  2 14:01:24 2006 [17156] dbg: bayes: not available for  
scanning, only 0 spam(s) in bayes DB  10
Thu Mar  2 14:01:24 2006 [17156] dbg: received-header: parsed as [  
ip=209.73.178.172 rdns=web60524.mail.yahoo.com  
helo=web60524.mail.yahoo.com by=nisyros.psmi.com.br ident= envfrom=  
intl=0 id= auth= ]
Thu Mar  2 14:01:24 2006 [17156] dbg: dns: looking up A records for  
'nisyros.psmi.com.br'
Thu Mar  2 14:01:24 2006 [17156] dbg: dns: A records for  
'nisyros.psmi.com.br': 201.64.97.21 201.64.97.21 201.64.97.21
Thu Mar  2 14:01:24 2006 [17156] dbg: dns: looking up A records for  
'nisyros.psmi.com.br'
Thu Mar  2 14:01:24 2006 [17156] dbg: dns: A records for  
'nisyros.psmi.com.br': 201.64.97.21 201.64.97.21 201.64.97.21
Thu Mar  2 14:01:24 2006 [17156] dbg: received-header: 'by'  
nisyros.psmi.com.br has public IP 201.64.97.21
Thu Mar  2 14:01:24 2006 [17156] dbg: received-header: relay  
209.73.178.172 trusted? no internal? no
Thu Mar  2 14:01:24 2006 [17156] dbg: dns: looking up PTR record for  
'201.64.97.17'
Thu Mar  2 14:01:24 2006 [17156] dbg: dns: PTR for '201.64.97.17':  
'nagios.psmi.com.br'
Thu Mar  2 14:01:24 2006 [17156] dbg: received-header: parsed as [  
ip=201.64.97.17 rdns=nagios.psmi.com.br helo=  
by=web60524.mail.yahoo.com ident= envfrom= intl=0 id= auth= ]
Thu Mar  2 14:01:24 2006 [17156] dbg: received-header: relay  
201.64.97.17 trusted? no internal? no

Thu Mar  2 14:01:24 2006 [17156] dbg: metadata: X-Spam-Relays-Trusted:
Thu Mar  2 14:01:24 2006 [17156] dbg: metadata:  
X-Spam-Relays-Untrusted: [ ip=209.73.178.172  
rdns=web60524.mail.yahoo.com helo=web60524.mail.yahoo.com  
by=nisyros.psmi.com.br ident= envfrom= intl=0 id= auth= ] [  
ip=201.64.97.17 rdns=nagios.psmi.com.br helo=  
by=web60524.mail.yahoo.com ident= envfrom= intl=0 id= auth= ]

Thu Mar  2 14:01:24 2006 [17156] dbg: message:  MIME PARSER START 
Thu Mar  2 14:01:24 2006 [17156] dbg: message: main message type:  
multipart/alternative
Thu Mar  2 14:01:24 2006 [17156] dbg: message: parsing multipart, got  
boundary: 0-1994749066-1141318697=:2926
Thu Mar  2 14:01:24 2006 [17156] dbg: message: found part of type  
text/plain, boundary: 0-1994749066-1141318697=:2926

Thu Mar  2 14:01:24 2006 [17156] dbg: message: parsing normal part
Thu Mar  2 14:01:24 2006 [17156] dbg: message: added part, type: text/plain
Thu Mar  2 14:01:24 2006 [17156] dbg: message: found part of type  
text/html, boundary: 0-1994749066-1141318697=:2926

Thu Mar  2 14:01:24 2006 

Help with bayes configuration

2005-11-18 Thread Pierre Faudon

Hello,

I installed spamassassin a month agowiththebayes auto learn option, but there is still 60% of spam that is not detected. In my bayes db there was nothing ...

[EMAIL PROTECTED] root]# sa-learn --dump magic0.000 0 3 0 non-token data: bayes db version0.000 0 0 0 non-token data: nspam0.000 0 0 0 non-token data: nham0.000 0 0 0 non-token data: ntokens0.000 0 0 0 non-token data: oldest atime0.000 0 0 0 non-token data: newest atime0.000 0 0 0 non-token data: last journal sync atime0.000 0 0 0 non-token data: last expiry atime0.000 0 0 0 non-token data: last expire atime delta0.000 0 0 0 non-token data: last expire reduction count

Today I rebuild the db but it seems still not working ...
[EMAIL PROTECTED] root]# sa-learn --ham --no-rebuild /etc/mail/spamassassin/The --no-rebuild option has been deprecated. Please use --no-sync instead.Learned from 5 message(s) (5 message(s) examined).[EMAIL PROTECTED] root]# sa-learn --spam --no-rebuild /etc/mail/spamassassin/The --no-rebuild option has been deprecated. Please use --no-sync instead.Learned from 6 message(s) (6 message(s) examined).[EMAIL PROTECTED] root]#[EMAIL PROTECTED] root]# sa-learn --rebuildThe --rebuild option has been deprecated. Please use --sync instead.synced Bayes databases from journal in 0 seconds: 549 unique entries (549 total entries)[EMAIL PROTECTED] root]#[EMAIL PROTECTED] spamassassin]# sa-learn --dump magic0.000 0 3 0 non-token data: bayes db version0.000 0 6 0 non-token data: nspam0.000 0 5 0 non-token data: nham0.000 0 337 0 non-token data: ntokens0.000 0 1132317402 0 non-token data: oldest atime0.000 0 1132317451 0 non-token data: newest atime0.000 0 1132317466 0 non-token data: last journal sync atime0.000 0 0 0 non-token data: last expiry atime0.000 0 0 0 non-token data: last expire atime delta0.000 0 0 0 non-token data: last expire reduction count



My spamassassin configuration :

[EMAIL PROTECTED] spamassassin]# cat local.cf# This is the right place to customize your installation of SpamAssassin.## See 'perldoc Mail::SpamAssassin::Conf' for details of what can be# tweaked.## rewrite_header Subject *SPAM*# report_safe 1trusted_networks 10. 192.168.1.# lock_method flock
# Scoringrequired_score 5# Score pour une probabilitée Spam entre 50 et 60% :
score DCC_CHECK 4.000score RAZOR2_CHECK 2.500
score BAYES_60 3# Score pout proba entre 60 et 70%score BAYES_70 4score BAYES_80 4.8score BAYES_95 5score BAYES_99 6
#user_scores_dsn DBI:mysql:spamassassin:127.0.0.1#user_scores_sql_username spamassassin#user_scores_sql_password password
# Encapsulation ?report_safe 0

dns_available yes
# Settings bayesbayes_path /etc/mail/spamassassin/bayes_file_mode 0777use_auto_whitelist 1auto_whitelist_path /etc/mail/spamassassin/whitelist
use_bayes 1use_bayes_rules 1bayes_auto_learn 1bayes_auto_learn_threshold_spam 25bayes_auto_learn_threshold_nonspam -5bayes_min_ham_num 60bayes_min_spam_num 100
#required_hits 2.6rewrite_subject 1subject_tag *SPAM*
# Enable or disable network checksskip_rbl_checks 0use_razor2 1use_dcc 1use_pyzor 0
# Mail using languages used in these country codes will not be marked# as being possibly spam in a foreign language.# - english french
# Mail using locales used in these country codes will not be marked# as being possibly spam in a foreign language.

[EMAIL PROTECTED] spamassassin]#



Accédez au courrier électronique de La Poste : www.laposte.net ;
3615 LAPOSTENET (0,34 /mn) ; tél : 08 92 68 13 50 (0,34/mn)




Help with Bayes auto-learn

2005-05-13 Thread Geoff Sweet
I would like to enable the Bayes system with auto-learning.  I thought 
that I had my config setup correctly but apparently I don't.  My config 
looks like this:

##
# How we want to modify the email
rewrite_header subject [**SPAM**]
report_safe 0
#Bayes learning system
use_bayes 1
bayes_auto_learn 1
# Define the sensitivity level. Standard level is 5.
required_hits 6.8
# Enable SpamAssassin's RBL checking features :
skip_rbl_checks 0
rbl_timeout 3
num_check_received 3
score RCVD_IN_BL_SPAMCOP_NET 3
report_header 1
use_terse_report 1
##
so I thought from the reading in the FAQ and on the wiki that this would 
enable bayes, and turn on its auto_learn for spam that hits higher then 
the default of 12.  But in my logs I end up with this:

2005-05-12 23:30:33.240563500 2005-05-13 06:30:33 [88906] i: connection 
from localhost.whootis.com [127.0.0.1] at port 4737
2005-05-12 23:30:33.333094500 2005-05-13 06:30:33 [88906] i: processing 
message [EMAIL PROTECTED] for qmaild:10004.
2005-05-12 23:30:33.431814500 2005-05-13 06:30:33 [88906] i: identified 
spam (23.2/6.8) for qmaild:10004 in 0.2 seconds, 1311 bytes.
2005-05-12 23:30:33.432514500 2005-05-13 06:30:33 [88906] i: result: Y 
23 - 
BAYES_99,FORGED_MUA_THEBAT_BOUN,FORGED_THEBAT_HTML,FORGED_YAHOO_RCVD,HEAD_ILLEGAL_CHARS,HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI,MSGID_RANDY,NORMAL_HTTP_TO_IP,RCVD_BY_IP,RCVD_DOUBLE_IP_LOOSE,RCVD_HELO_IP_MISMATCH,RCVD_NUMERIC_HELO,SUBJ_ILLEGAL_CHARS 
scantime=0.2,size=1311,mid=[EMAIL PROTECTED],bayes=0.999,autolearn=no

Does the autolearn=no mean that this message has not been submitted to 
bayes for auto-learn?  And if not, can someone steer me in the right 
direction for getting my config setup correctly?

Thanks very much,
Geoff Sweet


RE: Help with Bayes auto-learn

2005-05-13 Thread George Breahna
I can swear I saw this question in at least 20 different messages, not to
mention the website

I really recommend you research your question before asking it.

autolearn=no means that it didn't 'learn' this message.

Other possible states are 'spam, 'ham' and ... 'DISABLED'

If autolearn were to be disabled, you would see this last one.





I would like to enable the Bayes system with auto-learning.  I thought that
I had my config setup correctly but apparently I don't.  My config looks
like this:

##
# How we want to modify the email
rewrite_header subject [**SPAM**]
report_safe 0

#Bayes learning system
use_bayes 1
bayes_auto_learn 1

# Define the sensitivity level. Standard level is 5.
required_hits 6.8

# Enable SpamAssassin's RBL checking features :
skip_rbl_checks 0
rbl_timeout 3
num_check_received 3
score RCVD_IN_BL_SPAMCOP_NET 3
report_header 1
use_terse_report 1
##

so I thought from the reading in the FAQ and on the wiki that this would
enable bayes, and turn on its auto_learn for spam that hits higher then the
default of 12.  But in my logs I end up with this:

2005-05-12 23:30:33.240563500 2005-05-13 06:30:33 [88906] i: connection from
localhost.whootis.com [127.0.0.1] at port 4737
2005-05-12 23:30:33.333094500 2005-05-13 06:30:33 [88906] i: processing
message [EMAIL PROTECTED] for qmaild:10004.
2005-05-12 23:30:33.431814500 2005-05-13 06:30:33 [88906] i: identified spam
(23.2/6.8) for qmaild:10004 in 0.2 seconds, 1311 bytes.
2005-05-12 23:30:33.432514500 2005-05-13 06:30:33 [88906] i: result: Y
23 -
BAYES_99,FORGED_MUA_THEBAT_BOUN,FORGED_THEBAT_HTML,FORGED_YAHOO_RCVD,HEAD_IL
LEGAL_CHARS,HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY,MIME_HTML_ONLY
_MULTI,MSGID_RANDY,NORMAL_HTTP_TO_IP,RCVD_BY_IP,RCVD_DOUBLE_IP_LOOSE,RCVD_HE
LO_IP_MISMATCH,RCVD_NUMERIC_HELO,SUBJ_ILLEGAL_CHARS
scantime=0.2,size=1311,mid=[EMAIL PROTECTED],bayes=0.9
99,autolearn=no

Does the autolearn=no mean that this message has not been submitted to
bayes for auto-learn?  And if not, can someone steer me in the right
direction for getting my config setup correctly?

Thanks very much,
Geoff Sweet



Re: Help with Bayes auto-learn

2005-05-13 Thread wolfgang
In an older episode (Friday 13 May 2005 08:38), Geoff Sweet wrote:
 I would like to enable the Bayes system with auto-learning.  I thought 
 that I had my config setup correctly but apparently I don't.  My config 
 looks like this:
 
 ##
 # How we want to modify the email
 rewrite_header subject [**SPAM**]
 report_safe 0
 
 #Bayes learning system
 use_bayes 1
 bayes_auto_learn 1

In an older episode (Friday 13 May 2005 10:17), George Breahna wrote:
 I really recommend you research your question before asking it.

good point, anyway:

man Mail::SpamAssassin::Conf 
and
http://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Conf.html
would tell you:

bayes_min_ham_num (Default: 200)
bayes_min_spam_num (Default: 200)
To be accurate, the Bayes system does not activate until a certain number 
of ham (non-spam) and spam have been learned. The default is 200 of each ham 
and spam, but you can tune these up or down with these two settings.

for information how to learn the needed amount of mails, see

man sa-learn

regards,

wolfgang



Re: Help with Bayes auto-learn

2005-05-13 Thread Joe Zitnik

Yes, but his scoring list BAYES_99 as one of the scores, which means bayes is active, which means it has been fed the necessary 200 spam and 200 ham. If it hadn't been fed the necessary spam and ham, it would not have been given a BAYES score at all. The fact that the mail was not autolearned could mean that it did not fall within the autolearn range OR that an identical message had already been learned. With a score like BAYES_99, it is probably the latter. wolfgang [EMAIL PROTECTED] 5/13/2005 4:38 AM 
In an older episode (Friday 13 May 2005 08:38), Geoff Sweet wrote: I would like to enable the Bayes system with auto-learning. I thought  that I had my config setup correctly but apparently I don't. My config  looks like this:  ## # How we want to modify the email rewrite_header subject [**SPAM**] report_safe 0  #Bayes learning system use_bayes 1 bayes_auto_learn 1In an older episode (Friday 13 May 2005 10:17), George Breahna wrote: I really recommend you research your question before asking it.good point, anyway:man Mail::SpamAssassin::Conf andhttp://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Conf.htmlwould tell you:bayes_min_ham_num (Default: 200)bayes_min_spam_num (Default: 200) To be accurate, the Bayes system does not activate until a certain number of ham (non-spam) and spam have been learned. The default is 200 of each ham and spam, but you can tune these up or down with these two settings.for information how to learn the needed amount of mails, seeman sa-learnregards,wolfgang


Re: Help with Bayes auto-learn

2005-05-13 Thread wolfgang
In an older episode (Friday 13 May 2005 12:26), Joe Zitnik wrote:
 Yes, but his scoring list BAYES_99 as one of the scores, which means
 bayes is active, which means it has been fed the necessary 200 spam and
 200 ham.  If it hadn't been fed the necessary spam and ham, it would not
 have been given a BAYES score at all. 

thanks for pointing that out, i had missed that.

wolfgang


Re: Help with Bayes auto-learn

2005-05-13 Thread Matt Kettler
At 02:38 AM 5/13/2005, Geoff Sweet wrote:
2005-05-12 23:30:33.432514500 2005-05-13 06:30:33 [88906] i: result: Y 23 
- 
BAYES_99,FORGED_MUA_THEBAT_BOUN,FORGED_THEBAT_HTML,FORGED_YAHOO_RCVD,HEAD_ILLEGAL_CHARS,HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI,MSGID_RANDY,NORMAL_HTTP_TO_IP,RCVD_BY_IP,RCVD_DOUBLE_IP_LOOSE,RCVD_HELO_IP_MISMATCH,RCVD_NUMERIC_HELO,SUBJ_ILLEGAL_CHARS 
scantime=0.2,size=1311,mid=[EMAIL PROTECTED],bayes=0.999,autolearn=no

Does the autolearn=no mean that this message has not been submitted to 
bayes for auto-learn?  And if not, can someone steer me in the right 
direction for getting my config setup correctly?
First, I'm assuming you're using SA 3.0.0 or higher, if not, please specify 
version and I'll correct my message (some of the details differ)

That does mean the message was not autolearned. However, it does not mean 
that no messages will be autolearned. In SA 3.0 if autolearning was 
disabled, or failing, you would have seen disabled or failed, not no.

The requirements for autolearning are considerably more complex than just 
total score over xx.

The following things have to happen:
Note: ALL scores referenced below are the learning score. Learning score is 
NOT the same as the final spam score. It is the score recalculated as if 
bayes was disabled, *including* changing scoreset. Also all AWL, whitelist, 
and blacklist rules don't count towards this score.

1) total learning score over bayes_auto_learn_threshold_spam (default 12)
2) learning score of  header rules must be over 3.0
3) learning score of  body rules must be over 3.0
4) existing bayes learning must not be strongly ham (ie: don't learn as 
spam anything that would otherwise get bayes_00'ed)
5) From addresses (including Return-Path, etc) must not match a 
bayes_ignore_from statement
6) To addresses (including Cc, etc) must not match a bayes_ignore_from 
statement
7) The bayes DB must not be locked by some other SA process (another 
learner, expiry, etc). Note: this test results in autolearn=failed.

See also:
http://wiki.apache.org/spamassassin/AutolearningNotWorking



Re: Need help with Bayes DB

2004-12-24 Thread Loren Wilton
Also make sure that you updated the DB format if you moved from 2.6x to
3.0.1.  Maybe Bayes is turned on, but every time it tastes the DB it doesn't
like the format.

Loren



Need help with Bayes DB

2004-12-23 Thread SpamAssassin User
Hello:

I am running SpamAssassin 3.0.1 and have been having a problem.  When I run 
sa-learn -D --dump magic I get the following output.

0.000  0  3  0  non-token data: bayes db version
0.000  0   1084  0  non-token data: nspam
0.000  0   1361  0  non-token data: nham
0.000  0 109079  0  non-token data: ntokens
0.000  0 1078967175  0  non-token data: oldest atime
0.000  0 1103809663  0  non-token data: newest atime
0.000  0 1103809671  0  non-token data: last journal sync atime
0.000  0 1103808704  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire atime delta
0.000  0  0  0  non-token data: last expire reduction 
count

Now the problem is that the numbers for nspam, nham, ntokens, and oldest atime 
NEVER change.  What could be the cause for this?  How could I fix this problem?

Thanks in advance,
Ronald Vazquez


Re: Need help with Bayes DB

2004-12-23 Thread Richard Ozer
Check to make sure that you don't have a phantom local.cf somewhere that's pointing SA 
to the wrong directory for bayes.  For instance, see if you have both a 
/etc/spamassassin and /etc/mail/spamassassin folder.

Make sure it's not putting a new bayes database in your user/.spamassassin 
directory.
Make sure that you have appropriate rights to the bayes folder and files.  I've been 
using chmod 666.

RO
SpamAssassin User wrote:
Hello:
I am running SpamAssassin 3.0.1 and have been having a problem.  When I run 
sa-learn -D --dump magic I get the following output.
0.000  0  3  0  non-token data: bayes db version
0.000  0   1084  0  non-token data: nspam
0.000  0   1361  0  non-token data: nham
0.000  0 109079  0  non-token data: ntokens
0.000  0 1078967175  0  non-token data: oldest atime
0.000  0 1103809663  0  non-token data: newest atime
0.000  0 1103809671  0  non-token data: last journal sync atime
0.000  0 1103808704  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire atime delta
0.000  0  0  0  non-token data: last expire reduction 
count
Now the problem is that the numbers for nspam, nham, ntokens, and oldest atime 
NEVER change.  What could be the cause for this?  How could I fix this problem?
Thanks in advance,
Ronald Vazquez