Re: Spamcheck and how it affects bayes question
Matt Kettler wrote: Gary Smith wrote: We have a process in place using the perl CPAN module for invoking SA. This is outside of the scope of the normal mail system. Basically we use this to see what scores emails would generate for some statistical stuff. The spam engine this calls is to set use -100 as the score so that everything is considered spam. Our production spam engine is set to 7. We are looking at the score that the perl modules returns and logging it (rather than the isspam flag). To complicate things a little more, we are using MySql for the bayes store. This store is also used by our production boxes. This isn't the problem, just what we are doing. The CPAN module has this as the decription: public instance (\%) process (String $msg, Boolean $is_check_p) Description: This method makes a call to the spamd server and depending on the value of C<$is_check_p> either calls PROCESS or CHECK. Given that the perl call as a boolean option for PROCESS and CHECK, I would assume that they make some difference, but it really doesn't what the difference is. Currently in our code we are it with a false value, which executes the "PROCESS" commnad. What I'm wondering is will this through off bayes if we keep doing this as everything that SA is returning is considered spam? I'm just worried that these continued tests will cause bayes to get wacky. Also, should we be using PROCESS or CHECK when doing this type of checks. Gary The bayes auto-learning system does not care what your "required_score" is set to, and does not care if messages are tagged as spam or not. It uses its own thresholds, and its own additional criteria for learning. So, feeding it lots of mail with the threshold set to -100 shouldn't matter at all. If you're worried about it, set " bayes_auto_learn 0" in whatever conf file you use for your statistical setup. That way, you can take advantage of Bayes for scoring, but nothing you do on that system will affect the db. -- Bowie
RE: Spamcheck and how it affects bayes question
> The bayes auto-learning system does not care what your "required_score" > is set to, and does not care if messages are tagged as spam or not. It > uses its own thresholds, and its own additional criteria for learning. > > So, feeding it lots of mail with the threshold set to -100 shouldn't > matter at all. I can live with that answer. That's what I was looking for. Thanks, Gary
Re: Spamcheck and how it affects bayes question
Gary Smith wrote: > We have a process in place using the perl CPAN module for invoking SA. This > is outside of the scope of the normal mail system. Basically we use this to > see what scores emails would generate for some statistical stuff. The spam > engine this calls is to set use -100 as the score so that everything is > considered spam. Our production spam engine is set to 7. We are looking at > the score that the perl modules returns and logging it (rather than the > isspam flag). To complicate things a little more, we are using MySql for the > bayes store. This store is also used by our production boxes. This isn't > the problem, just what we are doing. > > The CPAN module has this as the decription: > public instance (\%) process (String $msg, Boolean $is_check_p) > Description: > This method makes a call to the spamd server and depending on the value of > C<$is_check_p> either calls PROCESS or CHECK. > > Given that the perl call as a boolean option for PROCESS and CHECK, I would > assume that they make some difference, but it really doesn't what the > difference is. Currently in our code we are it with a false value, which > executes the "PROCESS" commnad. > > What I'm wondering is will this through off bayes if we keep doing this as > everything that SA is returning is considered spam? I'm just worried that > these continued tests will cause bayes to get wacky. Also, should we be > using PROCESS or CHECK when doing this type of checks. > > Gary > > The bayes auto-learning system does not care what your "required_score" is set to, and does not care if messages are tagged as spam or not. It uses its own thresholds, and its own additional criteria for learning. So, feeding it lots of mail with the threshold set to -100 shouldn't matter at all.