Re: howto set bayes to ignore certain patterns?

2007-06-26 Thread Rick van der Zwet
On 6/26/07 4:28 PM, Matt Kettler wrote:
> Rick van der Zwet wrote:
>>> L, you could have a script find the relevant sha1
>>> hashes and remove them.
>>>
>>> However, why do you want to do this in the first place?
>>>
>>> SA's chi-squared combining is pretty good at ignoring words that appear
>>> in both spam and nonspam...
>>> 
>> Cause I know for example some really specific words which are added all
>> the time like footers/disclaimers/mailinglist prefixes. And I don't want
>> this words to affect the bayes score.
>>   
> They really shouldn't matter.
>> If you take for example a small spam message the ratio bad/good words
>> will be about 50 or more.
>>   
> So? the combining is chi-squared, which will favor the "stronger" tokens
> (ie: those close to 0 or 1.0) over the "present in everything" ones (ie:
> those close to 0.50).
> 
That I did not know and explaines/solved it :-)
/Rick

-- 
http://rickvanderzwet.nl


howto set bayes to ignore certain patterns?

2007-06-26 Thread Rick van der Zwet
Hi all,

The bayes module a spamassassin is a handy one, but I just want to help
him/her a bit, by telling what to ignore by default and not.

I know which headers to ignore and set those using the
bayes_ignore_header tags. But I would also like to ignore certain
(pattern of) words.

Does anyone know whether this will be possible, by either configure it
directly using some config or delete the words on day to day basis using
some kind of a script or some other alternative?

Thanks a lot!,
/Rick


Re: howto set bayes to ignore certain patterns?

2007-06-26 Thread Matt Kettler
Rick van der Zwet wrote:
>> L, you could have a script find the relevant sha1
>> hashes and remove them.
>>
>> However, why do you want to do this in the first place?
>>
>> SA's chi-squared combining is pretty good at ignoring words that appear
>> in both spam and nonspam...
>> 
> Cause I know for example some really specific words which are added all
> the time like footers/disclaimers/mailinglist prefixes. And I don't want
> this words to affect the bayes score.
>   
They really shouldn't matter.
> If you take for example a small spam message the ratio bad/good words
> will be about 50 or more.
>   
So? the combining is chi-squared, which will favor the "stronger" tokens
(ie: those close to 0 or 1.0) over the "present in everything" ones (ie:
those close to 0.50).




RE: howto set bayes to ignore certain patterns?

2007-06-26 Thread Dan Barker
Have you looked at sa-learn? I believe that's what you need.

Dan 

-Original Message-
From: Rick van der Zwet [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 26, 2007 10:06 AM
To: Matt Kettler
Cc: users@spamassassin.apache.org
Subject: Re: howto set bayes to ignore certain patterns?

On 6/26/07 3:51 PM, Matt Kettler wrote:
>> I know which headers to ignore and set those using the 
>> bayes_ignore_header tags. But I would also like to ignore certain 
>> (pattern of) words.
>>

> Well, if you use SQL, you could have a script find the relevant sha1 
> hashes and remove them.
> 
> However, why do you want to do this in the first place?
> 
> SA's chi-squared combining is pretty good at ignoring words that 
> appear in both spam and nonspam...
Cause I know for example some really specific words which are added all the
time like footers/disclaimers/mailinglist prefixes. And I don't want this
words to affect the bayes score.

If you take for example a small spam message the ratio bad/good words will
be about 50 or more.

/Rick
--
http://rickvanderzwet.nl



Re: howto set bayes to ignore certain patterns?

2007-06-26 Thread Rick van der Zwet
On 6/26/07 3:51 PM, Matt Kettler wrote:
>> I know which headers to ignore and set those using the
>> bayes_ignore_header tags. But I would also like to ignore certain
>> (pattern of) words.
>>

> Well, if you use SQL, you could have a script find the relevant sha1
> hashes and remove them.
> 
> However, why do you want to do this in the first place?
> 
> SA's chi-squared combining is pretty good at ignoring words that appear
> in both spam and nonspam...
Cause I know for example some really specific words which are added all
the time like footers/disclaimers/mailinglist prefixes. And I don't want
this words to affect the bayes score.

If you take for example a small spam message the ratio bad/good words
will be about 50 or more.

/Rick
-- 
http://rickvanderzwet.nl


Re: howto set bayes to ignore certain patterns?

2007-06-26 Thread Matt Kettler
Rick van der Zwet (user) wrote:
> Hi all,
>
>   The bayes module a spamassassin is a handy one, but I just want to help
> him/her a bit, by telling what to ignore by default and not.
>
> I know which headers to ignore and set those using the
> bayes_ignore_header tags. But I would also like to ignore certain
> (pattern of) words.
>
> Does anyone know whether this will be possible, by either configure it
> directly using some config or delete the words on day to day basis using
> some kind of a script or some other alternative?
>   

Well, if you use SQL, you could have a script find the relevant sha1
hashes and remove them.

However, why do you want to do this in the first place?

SA's chi-squared combining is pretty good at ignoring words that appear
in both spam and nonspam...




howto set bayes to ignore certain patterns?

2007-06-26 Thread Rick van der Zwet (user)
Hi all,

The bayes module a spamassassin is a handy one, but I just want to help
him/her a bit, by telling what to ignore by default and not.

I know which headers to ignore and set those using the
bayes_ignore_header tags. But I would also like to ignore certain
(pattern of) words.

Does anyone know whether this will be possible, by either configure it
directly using some config or delete the words on day to day basis using
some kind of a script or some other alternative?

Thanks a lot!,
/Rick