Re: sa-learn - bayes training...
I just had a chance to (finally) get back to this issue. I tried your suggestion, changed the mode to 0777 and re-started spamd. Apparently nothing changed. I did however realize that bayes tests are listed in my log file, even though they are not in the header of the msgs. So, I have bayes autolearn working fine. The database is also fine (> 6000 ham & spam learned). My logs show all that's expected. The messages header are missing the list of Bayes tests, but are otherwise fine. Spamassassin --lint returns no error. I have the SARE rules installed. Running qmail, with qmail-scanner v1.25 and SA 3.0.2. Everything works fine... Yet, I still have a lot of spam (I know that's relative) that slips through, more that before this SA upgrade. To show some numbers, I use to get a couple of false negatives per day, if any, before the upgrade, now I get anywhere from half a dozen to two dozens. Still much better that the 500 without SA, but not quite fine tuned enough for my taste. Any suggestions as to where to look next would be appreciated. Cheers, Jean Matt Kettler writes: Jean Caron wrote: Here's the bayes related I had in there already; use_bayes 1 bayes_path /home/bayesUID/bayes bayes_file_mode 0666 bayes_auto_learn 1 Jean Suggestion: set bayes_file_mode to 0777 not 0666. The bayes_file_mode is really a mask not literal permissions, so it won't result in executable bits being set for your bayes files. However, this mask is sometimes used in directory creation, where the x bit is quite appropriate. This is why the default is 0700, not 0600.
Re: sa-learn - bayes training...
Jean Caron wrote: > > Here's the bayes related I had in there already; > use_bayes 1 > bayes_path /home/bayesUID/bayes > bayes_file_mode 0666 > bayes_auto_learn 1 > Jean Suggestion: set bayes_file_mode to 0777 not 0666. The bayes_file_mode is really a mask not literal permissions, so it won't result in executable bits being set for your bayes files. However, this mask is sometimes used in directory creation, where the x bit is quite appropriate. This is why the default is 0700, not 0600.
Re: sa-learn - bayes training...
Alright. I find it strange that the defaults don't apply to my setup, but in any case I added the following to local.cf and re-started spamd. add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTS_ Here's the bayes related I had in there already; use_bayes 1 bayes_path /home/bayesUID/bayes bayes_file_mode 0666 bayes_auto_learn 1 Jean Kevin Peuhkurinen writes: Jean Caron wrote: Really ? I never saw bayes score in the header. Sould ALL msgs have a bayes score in the header ? Here's a sample header; Received: from 80.231.10.208 by mail (envelope-from <[EMAIL PROTECTED]>, uid 1001) with qmail-scanner-1.25 (spamassassin: 3.0.2. Clear:RC:0(80.231.10.208):SA:0(1.5/2.0):. Processed in 3.859362 secs); 14 Apr 2005 07:18:05 - X-Spam-Status: No, hits=1.5 required=2.0 X-Spam-Level: + Did I miss such an obvious switch somewhere ?? Jean For some reason, SA is not adding the tests that the email hit in the X-Spam-Status header, as is the default. Without this information, it's difficult to tell what is going on.Look in your local.cf file for either a "remove_header" or "add_header" entry.Remove (or comment out) any of the former and if you have any of the latter, make sure they read: add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTS_ autolearn=_AUTOLEARN_ version=_VERSION_ After making the change, be sure to restart spamd. Then begin to moniter your false negatives. The headers should then show which tests are hit. Look for BAYES tests and see which they are hitting.
Re: sa-learn - bayes training...
Jean Caron wrote: Really ? I never saw bayes score in the header. Sould ALL msgs have a bayes score in the header ? Here's a sample header; Received: from 80.231.10.208 by mail (envelope-from <[EMAIL PROTECTED]>, uid 1001) with qmail-scanner-1.25 (spamassassin: 3.0.2. Clear:RC:0(80.231.10.208):SA:0(1.5/2.0):. Processed in 3.859362 secs); 14 Apr 2005 07:18:05 - X-Spam-Status: No, hits=1.5 required=2.0 X-Spam-Level: + Did I miss such an obvious switch somewhere ?? Jean For some reason, SA is not adding the tests that the email hit in the X-Spam-Status header, as is the default. Without this information, it's difficult to tell what is going on.Look in your local.cf file for either a "remove_header" or "add_header" entry.Remove (or comment out) any of the former and if you have any of the latter, make sure they read: add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTS_ autolearn=_AUTOLEARN_ version=_VERSION_ After making the change, be sure to restart spamd. Then begin to moniter your false negatives. The headers should then show which tests are hit. Look for BAYES tests and see which they are hitting.
Re: sa-learn - bayes training...
Really ? I never saw bayes score in the header. Sould ALL msgs have a bayes score in the header ? Here's a sample header; Received: from 80.231.10.208 by mail (envelope-from <[EMAIL PROTECTED]>, uid 1001) with qmail-scanner-1.25 (spamassassin: 3.0.2. Clear:RC:0(80.231.10.208):SA:0(1.5/2.0):. Processed in 3.859362 secs); 14 Apr 2005 07:18:05 - X-Spam-Status: No, hits=1.5 required=2.0 X-Spam-Level: + Did I miss such an obvious switch somewhere ?? Jean Phil Barnett writes: On Friday 15 April 2005 08:03 am, Jean Caron wrote: Again, how can I tell for sure ? Look in the header and see what the bayes score was on the FN. -- "In the beginning of a change, the patriot is a brave and scarce man, hated and scorned. When the cause succeeds, however, the timid join him...for then it costs nothing to be a patriot." -Mark Twain
Re: sa-learn - bayes training...
On Friday 15 April 2005 08:03 am, Jean Caron wrote: > Again, how can I tell for sure ? Look in the header and see what the bayes score was on the FN. -- "In the beginning of a change, the patriot is a brave and scarce man, hated and scorned. When the cause succeeds, however, the timid join him...for then it costs nothing to be a patriot." -Mark Twain
Re: sa-learn - bayes training...
Kevin, my comments/questions are inline. Kevin Peuhkurinen writes: Jean Caron wrote: Kevin, your assumption is correct, user accounts are on the server and spamc is used. I already have the central DB setup using bayes_path in local.cf. I think what you are saying confirms what I suspected, but it's still not 100% clear. Even though I have a central DB, all users must train it individually, is that it ? For example, if UserA populates the shared folders respectively with ham and spam from messages he/she received, if UserB trains the central DB against those msgs, it will have no effect for UserA ? All users must individually train the central DB even though they train using the same msgs from the same shared folders ? Sorry if I seem a little dense, but I think I'm getting it. I hope ! Jean If you have bayes_path set, then all users should be using just the one DB, and any training that one user does will affect the results for all other users. Hummm... That's what I *thought*, but then the results led me to beleive otherwise, and now you are confirming that only one user can learn for all. So, presuming that the permissions on the Bayes files are set correctly so that all of your users have access to it, it would seem that you do have things set up properly. I thought so, but something is not doing its "thing". It is possible that the database is corrupt. How can I tell for sure ? As far as I can tell, using spamassassin --lint, sa-learn --dump, etc. the results seem to indicate a healthy DB. Have you in fact determined that most or all of your false negatives are due to low Bayes scores? Again, how can I tell for sure ? My main lead here is that since I upgraded to 3.0.2, I also changed from owning the DB myself, as a regular user, to making it system wide owned and trained by a dedicated user. And since then, I went from a handfull of false negatives a day, to almost a hundred. At first, and this is where I may have assumed wrong, I thought well alright I have a brand new DB and it needs to be trained that's all. I gave it enough time and training, but it never got better. I still have way more FN than I use to. I've also recently (this week) added the SARE rules, and the results are not much better. Jean
Re: sa-learn - bayes training...
Jean Caron wrote: Kevin, your assumption is correct, user accounts are on the server and spamc is used. I already have the central DB setup using bayes_path in local.cf. I think what you are saying confirms what I suspected, but it's still not 100% clear. Even though I have a central DB, all users must train it individually, is that it ? For example, if UserA populates the shared folders respectively with ham and spam from messages he/she received, if UserB trains the central DB against those msgs, it will have no effect for UserA ? All users must individually train the central DB even though they train using the same msgs from the same shared folders ? Sorry if I seem a little dense, but I think I'm getting it. I hope ! Jean If you have bayes_path set, then all users should be using just the one DB, and any training that one user does will affect the results for all other users. So, presuming that the permissions on the Bayes files are set correctly so that all of your users have access to it, it would seem that you do have things set up properly. It is possible that the database is corrupt.Have you in fact determined that most or all of your false negatives are due to low Bayes scores?
Re: sa-learn - bayes training...
Kevin, your assumption is correct, user accounts are on the server and spamc is used. I already have the central DB setup using bayes_path in local.cf. I think what you are saying confirms what I suspected, but it's still not 100% clear. Even though I have a central DB, all users must train it individually, is that it ? For example, if UserA populates the shared folders respectively with ham and spam from messages he/she received, if UserB trains the central DB against those msgs, it will have no effect for UserA ? All users must individually train the central DB even though they train using the same msgs from the same shared folders ? Sorry if I seem a little dense, but I think I'm getting it. I hope ! Jean Kevin Peuhkurinen writes: Jean Caron wrote: Folks, I searched the archive, tried different things, yet I need to ask a few questions. I'm running SA 3.0.2 with Qmail/QQ 1.25, and procmail, on linux. Works great. Bayes auto-learns ok, I run sa-learn from a "dedicated" user every night for ham and spam. My logs show how many msgs were inspected and how many were learned. So far so good. Here's the part I'm unsure of, I have one centralized bayes DB own by this "dedicated" user. This user runs sa-learn against two shared folders, one for ham and one for spam. All users (only a hand full) may populate the shared folders. Many thousand msgs have gone through sa-learn. I thought this was all too easy... My problem is bayes does not seem to have any effect what so ever on the amount of spam delivered to INBOXes. I keep receiving these low score spam msgs still. I now suspect this centralized DB, updated by this user alone, may not produce the expected results. I've read in the archive that individual users should run cron jobs against their own ham and spam folders. The issue with this is that only one user has an actual shell defined on the system, so the others can't run cron. Then again, that just a suspicion, I may be wrong, and something else may be missing or mis-configured, and that's why I'm posting this... I'm a little confused. I don't understand how bayes works exactly, so I can't come to any helpfull conclusion about my setup. Can anyone see through this and help me understand what is happening ? Thanks in advance, Jean Jean, I'm not entirely sure based on the information you provided how spamd is getting called, but I'm quite sure that your setup is not doing what you expect it to.I'm guessing since you say that you are using procmail that you have user accounts set up on the server itself and that spamc is being called as individual users from .forward files.If this is the case, then each user will have a .spamassassin/ directory in their home which will contain their own personal Bayes database. Your problem is that you have one particular user who runs sa-learn, so only their Bayes DB is being trained (other than through the auto-learning feature, that is, which is updating the individual databases). One easy option you can consider is the use of a global Bayes DB for all your users instead of each of them having their own personal DB. Bayes tends to be less effective with global rather than personal databases, but only if the individual users are able to do their own training. You could do this fairly easily by setting the "bayes_path" option in your /etc/mail/spamassassin/local.cf file and have it point the .spamassassin/ directory of the user who is doing all the sa-learn training. Hope that helps. Kevin
Re: sa-learn - bayes training...
Jean Caron wrote: Folks, I searched the archive, tried different things, yet I need to ask a few questions. I'm running SA 3.0.2 with Qmail/QQ 1.25, and procmail, on linux. Works great. Bayes auto-learns ok, I run sa-learn from a "dedicated" user every night for ham and spam. My logs show how many msgs were inspected and how many were learned. So far so good. Here's the part I'm unsure of, I have one centralized bayes DB own by this "dedicated" user. This user runs sa-learn against two shared folders, one for ham and one for spam. All users (only a hand full) may populate the shared folders. Many thousand msgs have gone through sa-learn. I thought this was all too easy... My problem is bayes does not seem to have any effect what so ever on the amount of spam delivered to INBOXes. I keep receiving these low score spam msgs still. I now suspect this centralized DB, updated by this user alone, may not produce the expected results. I've read in the archive that individual users should run cron jobs against their own ham and spam folders. The issue with this is that only one user has an actual shell defined on the system, so the others can't run cron. Then again, that just a suspicion, I may be wrong, and something else may be missing or mis-configured, and that's why I'm posting this... I'm a little confused. I don't understand how bayes works exactly, so I can't come to any helpfull conclusion about my setup. Can anyone see through this and help me understand what is happening ? Thanks in advance, Jean Jean, I'm not entirely sure based on the information you provided how spamd is getting called, but I'm quite sure that your setup is not doing what you expect it to.I'm guessing since you say that you are using procmail that you have user accounts set up on the server itself and that spamc is being called as individual users from .forward files.If this is the case, then each user will have a .spamassassin/ directory in their home which will contain their own personal Bayes database. Your problem is that you have one particular user who runs sa-learn, so only their Bayes DB is being trained (other than through the auto-learning feature, that is, which is updating the individual databases). One easy option you can consider is the use of a global Bayes DB for all your users instead of each of them having their own personal DB. Bayes tends to be less effective with global rather than personal databases, but only if the individual users are able to do their own training. You could do this fairly easily by setting the "bayes_path" option in your /etc/mail/spamassassin/local.cf file and have it point the .spamassassin/ directory of the user who is doing all the sa-learn training. Hope that helps. Kevin
sa-learn - bayes training...
Folks, I searched the archive, tried different things, yet I need to ask a few questions. I'm running SA 3.0.2 with Qmail/QQ 1.25, and procmail, on linux. Works great. Bayes auto-learns ok, I run sa-learn from a "dedicated" user every night for ham and spam. My logs show how many msgs were inspected and how many were learned. So far so good. Here's the part I'm unsure of, I have one centralized bayes DB own by this "dedicated" user. This user runs sa-learn against two shared folders, one for ham and one for spam. All users (only a hand full) may populate the shared folders. Many thousand msgs have gone through sa-learn. I thought this was all too easy... My problem is bayes does not seem to have any effect what so ever on the amount of spam delivered to INBOXes. I keep receiving these low score spam msgs still. I now suspect this centralized DB, updated by this user alone, may not produce the expected results. I've read in the archive that individual users should run cron jobs against their own ham and spam folders. The issue with this is that only one user has an actual shell defined on the system, so the others can't run cron. Then again, that just a suspicion, I may be wrong, and something else may be missing or mis-configured, and that's why I'm posting this... I'm a little confused. I don't understand how bayes works exactly, so I can't come to any helpfull conclusion about my setup. Can anyone see through this and help me understand what is happening ? Thanks in advance, Jean