Re: Spamcheck and how it affects bayes question

2009-07-22 Thread Bowie Bailey

Matt Kettler wrote:

Gary Smith wrote:
  

We have a process in place using the perl CPAN module for invoking SA.  This is 
outside of the scope of the normal mail system.  Basically we use this to see 
what scores emails would generate for some statistical stuff.  The spam engine 
this calls is to set use -100 as the score so that everything is considered 
spam.  Our production spam engine is set to 7.  We are looking at the score 
that the perl modules returns and logging it (rather than the isspam flag).  To 
complicate things a little more, we are using MySql for the bayes store.  This 
store is also used by our production boxes.  This isn't the problem, just what 
we are doing.

The CPAN module has this as the decription:
public instance (\%) process (String $msg, Boolean $is_check_p)
Description:
This method makes a call to the spamd server and depending on the value of
C<$is_check_p> either calls PROCESS or CHECK.

Given that the perl call as a boolean option for PROCESS and CHECK, I would assume that 
they make some difference, but it really doesn't what the difference is.  Currently in 
our code we are it with a false value, which executes the "PROCESS" commnad.

What I'm wondering is will this through off bayes if we keep doing this as 
everything that SA is returning is considered spam?  I'm just worried that 
these continued tests will cause bayes to get wacky.  Also, should we be using 
PROCESS or CHECK when doing this type of checks.

Gary

  


The bayes auto-learning system does not care what your "required_score"
is set to, and does not care if messages are tagged as spam or not. It
uses its own thresholds, and its own additional criteria for learning.

So, feeding it lots of mail with the threshold set to -100 shouldn't
matter at all.
  


If you're worried about it, set " bayes_auto_learn 0" in whatever conf 
file you use for your statistical setup.  That way, you can take 
advantage of Bayes for scoring, but nothing you do on that system will 
affect the db.


--
Bowie


RE: Spamcheck and how it affects bayes question

2009-07-21 Thread Gary Smith
> The bayes auto-learning system does not care what your "required_score"
> is set to, and does not care if messages are tagged as spam or not. It
> uses its own thresholds, and its own additional criteria for learning.
> 
> So, feeding it lots of mail with the threshold set to -100 shouldn't
> matter at all.

I can live with that answer.  That's what I was looking for.

Thanks, 

Gary


Re: Spamcheck and how it affects bayes question

2009-07-21 Thread Matt Kettler
Gary Smith wrote:
> We have a process in place using the perl CPAN module for invoking SA.  This 
> is outside of the scope of the normal mail system.  Basically we use this to 
> see what scores emails would generate for some statistical stuff.  The spam 
> engine this calls is to set use -100 as the score so that everything is 
> considered spam.  Our production spam engine is set to 7.  We are looking at 
> the score that the perl modules returns and logging it (rather than the 
> isspam flag).  To complicate things a little more, we are using MySql for the 
> bayes store.  This store is also used by our production boxes.  This isn't 
> the problem, just what we are doing.
>
> The CPAN module has this as the decription:
> public instance (\%) process (String $msg, Boolean $is_check_p)
> Description:
> This method makes a call to the spamd server and depending on the value of
> C<$is_check_p> either calls PROCESS or CHECK.
>
> Given that the perl call as a boolean option for PROCESS and CHECK, I would 
> assume that they make some difference, but it really doesn't what the 
> difference is.  Currently in our code we are it with a false value, which 
> executes the "PROCESS" commnad.
>
> What I'm wondering is will this through off bayes if we keep doing this as 
> everything that SA is returning is considered spam?  I'm just worried that 
> these continued tests will cause bayes to get wacky.  Also, should we be 
> using PROCESS or CHECK when doing this type of checks.
>
> Gary
>
>   
The bayes auto-learning system does not care what your "required_score"
is set to, and does not care if messages are tagged as spam or not. It
uses its own thresholds, and its own additional criteria for learning.

So, feeding it lots of mail with the threshold set to -100 shouldn't
matter at all.