Re: Bayes DB's

2004-12-06 Thread Daryl C. W. O'Shea
Gray, Richard wrote:
Surely that would only happen if there were equal amounts of Spam and
ham passing through. Otherwise the token will have a tendency toward
whichever the program has seen more of.

From: Loren Wilton [mailto:[EMAIL PROTECTED]
>
Assuming that the same header values appear in both spam and ham, I'd
expect that Bayes would conclude the token was useless for
classification and ignore it.
 
Loren 
Is there a reason why the trusted received headers would make good bayes 
tokens?  I can't think of any.

I can see the value in the first untrusted received header (which would 
be YOUR first server that the message hits), since you could tokenize 
data such as the received 'Date:' (and time of day) along with which MX 
the message came in on (most mail to a lower preference MX will be 
spam... but that could cause problems if your preferred MXes go down).

Daryl


RE: Bayes DB's

2004-12-06 Thread Gray, Richard



Surely that would only happen if there were equal 
amounts of Spam and ham passing through. Otherwise the token will have a 
tendency toward whichever the program has seen more of.


From: Loren Wilton 
[mailto:[EMAIL PROTECTED] Sent: 06 December 2004 
10:50To: users@spamassassin.apache.orgSubject: Re: Bayes 
DB's

Assuming that the same header values appear in both spam and 
ham, I'd expect that Bayes would conclude the token was useless for 
classification and ignore it.
 
        Loren 
 
 

---
This email from dns has been validated by dnsMSS Managed Email Security and is free from all known viruses.

For further information contact [EMAIL PROTECTED]







Re: Bayes DB's

2004-12-06 Thread Loren Wilton



Assuming that the same header values appear in both spam and 
ham, I'd expect that Bayes would conclude the token was useless for 
classification and ignore it.
 
        Loren

  - Original Message - 
  From: 
  Gray, 
  Richard 
  To: users@spamassassin.apache.org 
  
  Sent: Monday, December 06, 2004 1:46 
  AM
  Subject: Bayes DB's
  
  Our mailservers add their name to the received from header of every 
  message. As far as I can see, SA detects this and uses it to create 
  tokens when autolearning.
   
  Because our DB is shown more spam than ham, there are tokens in the 
  DBase that identify messages coming from our server as being more likely to be 
  spam than ham. This is quite bad. Is there any way to fix/prevent this 
  happening?
   
  Thanks,
   
  Richard---This 
  email from dns has been validated by dnsMSS Managed Email Security and is free 
  from all known viruses.For further information contact 
  [EMAIL PROTECTED]