Re: howto set bayes to ignore certain patterns?
On 6/26/07 4:28 PM, Matt Kettler wrote: > Rick van der Zwet wrote: >>> L, you could have a script find the relevant sha1 >>> hashes and remove them. >>> >>> However, why do you want to do this in the first place? >>> >>> SA's chi-squared combining is pretty good at ignoring words that appear >>> in both spam and nonspam... >>> >> Cause I know for example some really specific words which are added all >> the time like footers/disclaimers/mailinglist prefixes. And I don't want >> this words to affect the bayes score. >> > They really shouldn't matter. >> If you take for example a small spam message the ratio bad/good words >> will be about 50 or more. >> > So? the combining is chi-squared, which will favor the "stronger" tokens > (ie: those close to 0 or 1.0) over the "present in everything" ones (ie: > those close to 0.50). > That I did not know and explaines/solved it :-) /Rick -- http://rickvanderzwet.nl
howto set bayes to ignore certain patterns?
Hi all, The bayes module a spamassassin is a handy one, but I just want to help him/her a bit, by telling what to ignore by default and not. I know which headers to ignore and set those using the bayes_ignore_header tags. But I would also like to ignore certain (pattern of) words. Does anyone know whether this will be possible, by either configure it directly using some config or delete the words on day to day basis using some kind of a script or some other alternative? Thanks a lot!, /Rick
Re: howto set bayes to ignore certain patterns?
Rick van der Zwet wrote: >> L, you could have a script find the relevant sha1 >> hashes and remove them. >> >> However, why do you want to do this in the first place? >> >> SA's chi-squared combining is pretty good at ignoring words that appear >> in both spam and nonspam... >> > Cause I know for example some really specific words which are added all > the time like footers/disclaimers/mailinglist prefixes. And I don't want > this words to affect the bayes score. > They really shouldn't matter. > If you take for example a small spam message the ratio bad/good words > will be about 50 or more. > So? the combining is chi-squared, which will favor the "stronger" tokens (ie: those close to 0 or 1.0) over the "present in everything" ones (ie: those close to 0.50).
RE: howto set bayes to ignore certain patterns?
Have you looked at sa-learn? I believe that's what you need. Dan -Original Message- From: Rick van der Zwet [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 26, 2007 10:06 AM To: Matt Kettler Cc: users@spamassassin.apache.org Subject: Re: howto set bayes to ignore certain patterns? On 6/26/07 3:51 PM, Matt Kettler wrote: >> I know which headers to ignore and set those using the >> bayes_ignore_header tags. But I would also like to ignore certain >> (pattern of) words. >> > Well, if you use SQL, you could have a script find the relevant sha1 > hashes and remove them. > > However, why do you want to do this in the first place? > > SA's chi-squared combining is pretty good at ignoring words that > appear in both spam and nonspam... Cause I know for example some really specific words which are added all the time like footers/disclaimers/mailinglist prefixes. And I don't want this words to affect the bayes score. If you take for example a small spam message the ratio bad/good words will be about 50 or more. /Rick -- http://rickvanderzwet.nl
Re: howto set bayes to ignore certain patterns?
On 6/26/07 3:51 PM, Matt Kettler wrote: >> I know which headers to ignore and set those using the >> bayes_ignore_header tags. But I would also like to ignore certain >> (pattern of) words. >> > Well, if you use SQL, you could have a script find the relevant sha1 > hashes and remove them. > > However, why do you want to do this in the first place? > > SA's chi-squared combining is pretty good at ignoring words that appear > in both spam and nonspam... Cause I know for example some really specific words which are added all the time like footers/disclaimers/mailinglist prefixes. And I don't want this words to affect the bayes score. If you take for example a small spam message the ratio bad/good words will be about 50 or more. /Rick -- http://rickvanderzwet.nl
Re: howto set bayes to ignore certain patterns?
Rick van der Zwet (user) wrote: > Hi all, > > The bayes module a spamassassin is a handy one, but I just want to help > him/her a bit, by telling what to ignore by default and not. > > I know which headers to ignore and set those using the > bayes_ignore_header tags. But I would also like to ignore certain > (pattern of) words. > > Does anyone know whether this will be possible, by either configure it > directly using some config or delete the words on day to day basis using > some kind of a script or some other alternative? > Well, if you use SQL, you could have a script find the relevant sha1 hashes and remove them. However, why do you want to do this in the first place? SA's chi-squared combining is pretty good at ignoring words that appear in both spam and nonspam...
howto set bayes to ignore certain patterns?
Hi all, The bayes module a spamassassin is a handy one, but I just want to help him/her a bit, by telling what to ignore by default and not. I know which headers to ignore and set those using the bayes_ignore_header tags. But I would also like to ignore certain (pattern of) words. Does anyone know whether this will be possible, by either configure it directly using some config or delete the words on day to day basis using some kind of a script or some other alternative? Thanks a lot!, /Rick