subject:"sa\-learn \- bayes training..."

Re: sa-learn - bayes training...

2005-04-25 Thread Jean Caron

I just had a chance to (finally) get back to this issue. I tried your 
suggestion, changed the mode to 0777 and re-started spamd. Apparently 
nothing changed. 

I did however realize that bayes tests are listed in my log file, even 
though they are not in the header of the msgs. 

So, I have bayes autolearn working fine. The database is also fine (> 6000 
ham & spam learned). My logs show all that's expected. The messages header 
are missing the list of Bayes tests, but are otherwise fine. Spamassassin 
--lint returns no error. I have the SARE rules installed. Running qmail, 
with qmail-scanner v1.25 and SA 3.0.2. Everything works fine... 

Yet, I still have a lot of spam (I know that's relative) that slips through, 
more that before this SA upgrade. To show some numbers, I use to get a 
couple of false negatives per day, if any, before the upgrade, now I get 
anywhere from half a dozen to two dozens. Still much better that the 500 
without SA, but not quite fine tuned enough for my taste. 

Any suggestions as to where to look next would be appreciated.
Cheers,
Jean 

Matt Kettler writes: 

Jean Caron wrote: 

Here's the bayes related I had in there already;
use_bayes 1
bayes_path  /home/bayesUID/bayes
bayes_file_mode 0666
bayes_auto_learn 1
Jean 
Suggestion: set bayes_file_mode to 0777 not 0666. 

The bayes_file_mode is really a mask not literal permissions, so it
won't result in executable bits being set for your bayes files. However,
this mask is sometimes used in directory creation, where the x bit is
quite appropriate. 

This is why the default is 0700, not 0600.

Re: sa-learn - bayes training...

2005-04-15 Thread Matt Kettler

Jean Caron wrote:

>
> Here's the bayes related I had in there already;
> use_bayes 1
> bayes_path  /home/bayesUID/bayes
> bayes_file_mode 0666
> bayes_auto_learn 1
> Jean 

Suggestion: set bayes_file_mode to 0777 not 0666.

The bayes_file_mode is really a mask not literal permissions, so it
won't result in executable bits being set for your bayes files. However,
this mask is sometimes used in directory creation, where the x bit is
quite appropriate.

This is why the default is 0700, not 0600.

Re: sa-learn - bayes training...

2005-04-15 Thread Jean Caron

Alright. I find it strange that the defaults don't apply to my setup, but in 
any case I added the following to local.cf and re-started spamd.
add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTS_
Here's the bayes related I had in there already; 

use_bayes 1
bayes_path  /home/bayesUID/bayes
bayes_file_mode 0666
bayes_auto_learn 1 

Jean 

Kevin Peuhkurinen writes: 

Jean Caron wrote: 

Really ? I never saw bayes score in the header. Sould ALL msgs have a 
bayes score in the header ? Here's a sample header;
Received: from 80.231.10.208 by mail (envelope-from 
<[EMAIL PROTECTED]>, uid 1001) with qmail-scanner-1.25 
(spamassassin: 3.0.2. Clear:RC:0(80.231.10.208):SA:0(1.5/2.0):. Processed 
in 3.859362 secs); 14 Apr 2005 07:18:05 -
X-Spam-Status: No, hits=1.5 required=2.0
X-Spam-Level: +
Did I miss such an obvious switch somewhere ??
Jean 

For some reason, SA is not adding the tests that the email hit in the 
X-Spam-Status header, as is the default.   Without this information, it's 
difficult to tell what is going on.Look in your local.cf file for 
either a "remove_header" or "add_header" entry.Remove (or comment out) 
any of the former and if you have any of the latter, make sure they read: 

add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTS_ 
autolearn=_AUTOLEARN_ version=_VERSION_ 

After making the change, be sure to restart spamd.   Then begin to moniter 
your false negatives.   The headers should then show which tests are hit.  
 Look for BAYES tests and see which they are hitting.

Re: sa-learn - bayes training...

2005-04-15 Thread Kevin Peuhkurinen

Jean Caron wrote:
Really ? I never saw bayes score in the header. Sould ALL msgs have a 
bayes score in the header ? Here's a sample header;
Received: from 80.231.10.208 by mail (envelope-from 
<[EMAIL PROTECTED]>, uid 1001) with qmail-scanner-1.25 
(spamassassin: 3.0.2. Clear:RC:0(80.231.10.208):SA:0(1.5/2.0):. 
Processed in 3.859362 secs); 14 Apr 2005 07:18:05 -
X-Spam-Status: No, hits=1.5 required=2.0
X-Spam-Level: +
Did I miss such an obvious switch somewhere ??
Jean

For some reason, SA is not adding the tests that the email hit in the 
X-Spam-Status header, as is the default.   Without this information, 
it's difficult to tell what is going on.Look in your local.cf file 
for either a "remove_header" or "add_header" entry.Remove (or 
comment out) any of the former and if you have any of the latter, make 
sure they read:

add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTS_ 
autolearn=_AUTOLEARN_ version=_VERSION_
After making the change, be sure to restart spamd.   Then begin to 
moniter your false negatives.   The headers should then show which tests 
are hit.   Look for BAYES tests and see which they are hitting.

Re: sa-learn - bayes training...

2005-04-15 Thread Jean Caron

Really ? I never saw bayes score in the header. Sould ALL msgs have a bayes 
score in the header ? Here's a sample header; 

Received: from 80.231.10.208 by mail (envelope-from 
<[EMAIL PROTECTED]>, uid 1001) with qmail-scanner-1.25 (spamassassin: 
3.0.2. Clear:RC:0(80.231.10.208):SA:0(1.5/2.0):. Processed in 3.859362 
secs); 14 Apr 2005 07:18:05 -
X-Spam-Status: 	No, hits=1.5 required=2.0
X-Spam-Level: 	+ 

Did I miss such an obvious switch somewhere ??
Jean 

Phil Barnett writes: 

On Friday 15 April 2005 08:03 am, Jean Caron wrote: 

Again, how can I tell for sure ?
Look in the header and see what the bayes score was on the FN. 

--  

"In the beginning of a change, the patriot is a brave and scarce man, hated 
and scorned. When the cause succeeds, however, the timid join him...for then 
it costs nothing to be a patriot." -Mark Twain

Re: sa-learn - bayes training...

2005-04-15 Thread Phil Barnett

On Friday 15 April 2005 08:03 am, Jean Caron wrote:

> Again, how can I tell for sure ?

Look in the header and see what the bayes score was on the FN.

-- 

"In the beginning of a change, the patriot is a brave and scarce man, hated 
and scorned. When the cause succeeds, however, the timid join him...for then 
it costs nothing to be a patriot." -Mark Twain

Re: sa-learn - bayes training...

2005-04-15 Thread Jean Caron

Kevin, my comments/questions are inline. 

Kevin Peuhkurinen writes: 

Jean Caron wrote: 

Kevin, your assumption is correct, user accounts are on the server and 
spamc is used. I already have the central DB setup using bayes_path in 
local.cf.
I think what you are saying confirms what I suspected, but it's still not 
100% clear. Even though I have a central DB, all users must train it 
individually, is that it ?
For example, if UserA populates the shared folders respectively with ham 
and spam from messages he/she received, if UserB trains the central DB 
against those msgs, it will have no effect for UserA ? All users must 
individually train the central DB even though they train using the same 
msgs from the same shared folders ?
Sorry if I seem a little dense, but I think I'm getting it. I hope !
Jean 

If you have bayes_path set, then all users should be using just the one 
DB, and any training that one user does will affect the results for all 
other users.   
Hummm... That's what I *thought*, but then the results led me to beleive 
otherwise, and now you are confirming that only one user can learn for all. 

So, presuming that the permissions on the Bayes files are 
set correctly so that all of your users have access to it, it would seem 
that you do have things set up properly.
I thought so, but something is not doing its "thing". 

It is possible that the database is corrupt.
How can I tell for sure ? As far as I can tell, using spamassassin --lint, 
sa-learn --dump, etc. the results seem to indicate a healthy DB. 

Have you in fact 
determined that most or all of your false negatives are due to low Bayes 
scores? 

Again, how can I tell for sure ? My main lead here is that since I upgraded 
to 3.0.2, I also changed from owning the DB myself, as a regular user, to 
making it system wide owned and trained by a dedicated user. And since then, 
I went from a handfull of false negatives a day, to almost a hundred. At 
first, and this is where I may have assumed wrong, I thought well alright I 
have a brand new DB and it needs to be trained that's all. I gave it enough 
time and training, but it never got better. I still have way more FN than I 
use to. I've also recently (this week) added the SARE rules, and the results 
are not much better. 

Jean

Re: sa-learn - bayes training...

2005-04-15 Thread Kevin Peuhkurinen

Jean Caron wrote:
Kevin, your assumption is correct, user accounts are on the server and 
spamc is used. I already have the central DB setup using bayes_path in 
local.cf.
I think what you are saying confirms what I suspected, but it's still 
not 100% clear. Even though I have a central DB, all users must train 
it individually, is that it ?
For example, if UserA populates the shared folders respectively with 
ham and spam from messages he/she received, if UserB trains the 
central DB against those msgs, it will have no effect for UserA ? All 
users must individually train the central DB even though they train 
using the same msgs from the same shared folders ?
Sorry if I seem a little dense, but I think I'm getting it. I hope !
Jean

If you have bayes_path set, then all users should be using just the one 
DB, and any training that one user does will affect the results for all 
other users.   So, presuming that the permissions on the Bayes files are 
set correctly so that all of your users have access to it, it would seem 
that you do have things set up properly.   

It is possible that the database is corrupt.Have you in fact 
determined that most or all of your false negatives are due to low Bayes 
scores?

Re: sa-learn - bayes training...

2005-04-14 Thread Jean Caron

Kevin, your assumption is correct, user accounts are on the server and spamc 
is used. I already have the central DB setup using bayes_path in local.cf. 

I think what you are saying confirms what I suspected, but it's still not 
100% clear. Even though I have a central DB, all users must train it 
individually, is that it ? 

For example, if UserA populates the shared folders respectively with ham and 
spam from messages he/she received, if UserB trains the central DB against 
those msgs, it will have no effect for UserA ? All users must individually 
train the central DB even though they train using the same msgs from the 
same shared folders ? 

Sorry if I seem a little dense, but I think I'm getting it. I hope !
Jean 

Kevin Peuhkurinen writes: 

Jean Caron wrote: 

Folks,
I searched the archive, tried different things, yet I need to ask a few 
questions.
I'm running SA 3.0.2 with Qmail/QQ 1.25, and procmail, on linux. Works 
great. Bayes auto-learns ok, I run sa-learn from a "dedicated" user every 
night for ham and spam. My logs show how many msgs were inspected and how 
many were learned. So far so good.
Here's the part I'm unsure of, I have one centralized bayes DB own by 
this "dedicated" user. This user runs sa-learn against two shared 
folders, one for ham and one for spam. All users (only a hand full) may 
populate the shared folders. Many thousand msgs have gone through 
sa-learn. I thought this was all too easy...
My problem is bayes does not seem to have any effect what so ever on the 
amount of spam delivered to INBOXes. I keep receiving these low score 
spam msgs still.
I now suspect this centralized DB, updated by this user alone, may not 
produce the expected results. I've read in the archive that individual 
users should run cron jobs against their own ham and spam folders. The 
issue with this is that only one user has an actual shell defined on the 
system, so the others can't run cron. Then again, that just a suspicion, 
I may be wrong, and something else may be missing or mis-configured, and 
that's why I'm posting this... I'm a little confused. I don't understand 
how bayes works exactly, so I can't come to any helpfull conclusion about 
my setup.
Can anyone see through this and help me understand what is happening ?
Thanks in advance,
Jean 

Jean,
I'm not entirely sure based on the information you provided how spamd is 
getting called, but I'm quite sure that your setup is not doing what you 
expect it to.I'm guessing since you say that you are using procmail 
that you have user accounts set up on the server itself and that spamc is 
being called as individual users from .forward files.If this is the 
case, then each user will have a .spamassassin/ directory in their home 
which will contain their own personal Bayes database.   Your problem is 
that you have one particular user who runs sa-learn, so only their Bayes 
DB is being trained (other than through the auto-learning feature, that 
is, which is  updating the individual databases).   

One easy option you can consider is the use of a global Bayes DB for all 
your users instead of each of them having their own personal DB.   Bayes 
tends to be less effective with global rather than personal databases, but 
only if the individual users are able to do their own training.   You 
could do this fairly easily by setting the "bayes_path" option in your 
/etc/mail/spamassassin/local.cf file and have it point the .spamassassin/ 
directory of the user who is doing all the sa-learn training. 

Hope that helps.
Kevin

Re: sa-learn - bayes training...

2005-04-13 Thread Kevin Peuhkurinen

Jean Caron wrote:
Folks,
I searched the archive, tried different things, yet I need to ask a 
few questions.
I'm running SA 3.0.2 with Qmail/QQ 1.25, and procmail, on linux. Works 
great. Bayes auto-learns ok, I run sa-learn from a "dedicated" user 
every night for ham and spam. My logs show how many msgs were 
inspected and how many were learned. So far so good.
Here's the part I'm unsure of, I have one centralized bayes DB own by 
this "dedicated" user. This user runs sa-learn against two shared 
folders, one for ham and one for spam. All users (only a hand full) 
may populate the shared folders. Many thousand msgs have gone through 
sa-learn. I thought this was all too easy...
My problem is bayes does not seem to have any effect what so ever on 
the amount of spam delivered to INBOXes. I keep receiving these low 
score spam msgs still.
I now suspect this centralized DB, updated by this user alone, may not 
produce the expected results. I've read in the archive that individual 
users should run cron jobs against their own ham and spam folders. The 
issue with this is that only one user has an actual shell defined on 
the system, so the others can't run cron. Then again, that just a 
suspicion, I may be wrong, and something else may be missing or 
mis-configured, and that's why I'm posting this... I'm a little 
confused. I don't understand how bayes works exactly, so I can't come 
to any helpfull conclusion about my setup.
Can anyone see through this and help me understand what is happening ?
Thanks in advance,
Jean

Jean,
I'm not entirely sure based on the information you provided how spamd is 
getting called, but I'm quite sure that your setup is not doing what you 
expect it to.I'm guessing since you say that you are using procmail 
that you have user accounts set up on the server itself and that spamc 
is being called as individual users from .forward files.If this is 
the case, then each user will have a .spamassassin/ directory in their 
home which will contain their own personal Bayes database.   Your 
problem is that you have one particular user who runs sa-learn, so only 
their Bayes DB is being trained (other than through the auto-learning 
feature, that is, which is  updating the individual databases).  

One easy option you can consider is the use of a global Bayes DB for all 
your users instead of each of them having their own personal DB.   Bayes 
tends to be less effective with global rather than personal databases, 
but only if the individual users are able to do their own training.   
You could do this fairly easily by setting the "bayes_path" option in 
your /etc/mail/spamassassin/local.cf file and have it point the 
.spamassassin/ directory of the user who is doing all the sa-learn training.

Hope that helps.
Kevin

sa-learn - bayes training...

2005-04-13 Thread Jean Caron

Folks, 

I searched the archive, tried different things, yet I need to ask a few 
questions. 

I'm running SA 3.0.2 with Qmail/QQ 1.25, and procmail, on linux. Works 
great. Bayes auto-learns ok, I run sa-learn from a "dedicated" user every 
night for ham and spam. My logs show how many msgs were inspected and how 
many were learned. So far so good. 

Here's the part I'm unsure of, I have one centralized bayes DB own by this 
"dedicated" user. This user runs sa-learn against two shared folders, one 
for ham and one for spam. All users (only a hand full) may populate the 
shared folders. Many thousand msgs have gone through sa-learn. I thought 
this was all too easy... 

My problem is bayes does not seem to have any effect what so ever on the 
amount of spam delivered to INBOXes. I keep receiving these low score spam 
msgs still. 

I now suspect this centralized DB, updated by this user alone, may not 
produce the expected results. I've read in the archive that individual users 
should run cron jobs against their own ham and spam folders. The issue with 
this is that only one user has an actual shell defined on the system, so the 
others can't run cron. Then again, that just a suspicion, I may be wrong, 
and something else may be missing or mis-configured, and that's why I'm 
posting this... I'm a little confused. I don't understand how bayes works 
exactly, so I can't come to any helpfull conclusion about my setup. 

Can anyone see through this and help me understand what is happening ?
Thanks in advance,
Jean

Re: sa-learn - bayes training...

Re: sa-learn - bayes training...

Re: sa-learn - bayes training...

Re: sa-learn - bayes training...

Re: sa-learn - bayes training...

Re: sa-learn - bayes training...

Re: sa-learn - bayes training...

Re: sa-learn - bayes training...

Re: sa-learn - bayes training...

Re: sa-learn - bayes training...

sa-learn - bayes training...

11 matches

Site Navigation

Mail list logo

Footer information