Re: howto set bayes to ignore certain patterns?

2007-06-26 Thread Matt Kettler
Rick van der Zwet (user) wrote:
 Hi all,

   The bayes module a spamassassin is a handy one, but I just want to help
 him/her a bit, by telling what to ignore by default and not.

 I know which headers to ignore and set those using the
 bayes_ignore_header tags. But I would also like to ignore certain
 (pattern of) words.

 Does anyone know whether this will be possible, by either configure it
 directly using some config or delete the words on day to day basis using
 some kind of a script or some other alternative?
   

Well, if you use SQL, you could have a script find the relevant sha1
hashes and remove them.

However, why do you want to do this in the first place?

SA's chi-squared combining is pretty good at ignoring words that appear
in both spam and nonspam...




Re: howto set bayes to ignore certain patterns?

2007-06-26 Thread Rick van der Zwet
On 6/26/07 3:51 PM, Matt Kettler wrote:
 I know which headers to ignore and set those using the
 bayes_ignore_header tags. But I would also like to ignore certain
 (pattern of) words.


 Well, if you use SQL, you could have a script find the relevant sha1
 hashes and remove them.
 
 However, why do you want to do this in the first place?
 
 SA's chi-squared combining is pretty good at ignoring words that appear
 in both spam and nonspam...
Cause I know for example some really specific words which are added all
the time like footers/disclaimers/mailinglist prefixes. And I don't want
this words to affect the bayes score.

If you take for example a small spam message the ratio bad/good words
will be about 50 or more.

/Rick
-- 
http://rickvanderzwet.nl


RE: howto set bayes to ignore certain patterns?

2007-06-26 Thread Dan Barker
Have you looked at sa-learn? I believe that's what you need.

Dan 

-Original Message-
From: Rick van der Zwet [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 26, 2007 10:06 AM
To: Matt Kettler
Cc: users@spamassassin.apache.org
Subject: Re: howto set bayes to ignore certain patterns?

On 6/26/07 3:51 PM, Matt Kettler wrote:
 I know which headers to ignore and set those using the 
 bayes_ignore_header tags. But I would also like to ignore certain 
 (pattern of) words.


 Well, if you use SQL, you could have a script find the relevant sha1 
 hashes and remove them.
 
 However, why do you want to do this in the first place?
 
 SA's chi-squared combining is pretty good at ignoring words that 
 appear in both spam and nonspam...
Cause I know for example some really specific words which are added all the
time like footers/disclaimers/mailinglist prefixes. And I don't want this
words to affect the bayes score.

If you take for example a small spam message the ratio bad/good words will
be about 50 or more.

/Rick
--
http://rickvanderzwet.nl



Re: howto set bayes to ignore certain patterns?

2007-06-26 Thread Matt Kettler
Rick van der Zwet wrote:
 L, you could have a script find the relevant sha1
 hashes and remove them.

 However, why do you want to do this in the first place?

 SA's chi-squared combining is pretty good at ignoring words that appear
 in both spam and nonspam...
 
 Cause I know for example some really specific words which are added all
 the time like footers/disclaimers/mailinglist prefixes. And I don't want
 this words to affect the bayes score.
   
They really shouldn't matter.
 If you take for example a small spam message the ratio bad/good words
 will be about 50 or more.
   
So? the combining is chi-squared, which will favor the stronger tokens
(ie: those close to 0 or 1.0) over the present in everything ones (ie:
those close to 0.50).