Fighting ham

2007-04-18 Thread Robert Fitzpatrick
Our bayes was apparently giving negative scores incorrectly and I
re-built it since it was not effective and letting through a lot of
spam. I didn't realize, but it seems those negative scores were keeping
SA from applying other tests? Since fixing bayes, we are blocking so
much ham it is not funny. These are the rules that I have basically had
to disable them below. We run Rules Du Jour, but only zero level rules,
those are the only updates besides bayes, plus KAM.cf and Botnet.cf.
Given Botnet.cf blocks quite a few, but I understand why. I don't know
if any of these rules are part of RDJ, but why so much ham is being hit
with only these rules. Does SA with updates and these rules hit so much
ham for others? We are constantly getting complaints of our over
aggressive spam filters.

score PART_CID_STOCK 0
score PART_CID_STOCK_LESS 0
score TVD_FW_GRAPHIC_ID1 0
score TVD_FW_GRAPHIC_ID3 0
score TVD_FW_GRAPHIC_ID3_2 0
score MY_CID_AND_STYLE 0

-- 
Robert



Re: Fighting ham

2007-04-18 Thread Craig Carriere
Robert:

It sounds like your problem rests with your bayes database.  Some SA
rules will fire on almost all mail, but a properly trained bayes filter
should be able to reduce your scores to under your spam threshold.  None
of these scores rate out very aggressively so I am surprised that these
are pushing you over your spam threshold.  How have you trained bayes
with you spam and ham mail?  Also I think that the default SA setting of
200 spam and 200 ham is a little low and do not regard bayes as truly
effective until about 1000 message of each kind are learned.  That being
said I would, and have, reduced the default score for Botnet from 5.0 to
3.0.  Also, if your run the 00_ version of Fred's rules note that many
of them are very aggressively scored.  I personally do not let any rule
score at over 3.0, except some network test, to allow bayes to recover
the mail from listing as a FP.

Best

Robert Fitzpatrick wrote:
> Our bayes was apparently giving negative scores incorrectly and I
> re-built it since it was not effective and letting through a lot of
> spam. I didn't realize, but it seems those negative scores were keeping
> SA from applying other tests? Since fixing bayes, we are blocking so
> much ham it is not funny. These are the rules that I have basically had
> to disable them below. We run Rules Du Jour, but only zero level rules,
> those are the only updates besides bayes, plus KAM.cf and Botnet.cf.
> Given Botnet.cf blocks quite a few, but I understand why. I don't know
> if any of these rules are part of RDJ, but why so much ham is being hit
> with only these rules. Does SA with updates and these rules hit so much
> ham for others? We are constantly getting complaints of our over
> aggressive spam filters.
>
> score PART_CID_STOCK 0
> score PART_CID_STOCK_LESS 0
> score TVD_FW_GRAPHIC_ID1 0
> score TVD_FW_GRAPHIC_ID3 0
> score TVD_FW_GRAPHIC_ID3_2 0
> score MY_CID_AND_STYLE 0
>
>   


Re: Fighting ham

2007-04-18 Thread Robert Fitzpatrick
On Wed, 2007-04-18 at 10:23 -0500, Craig Carriere wrote:
> Robert:
> 
> It sounds like your problem rests with your bayes database.  Some SA
> rules will fire on almost all mail, but a properly trained bayes filter
> should be able to reduce your scores to under your spam threshold.  None
> of these scores rate out very aggressively so I am surprised that these
> are pushing you over your spam threshold.  How have you trained bayes
> with you spam and ham mail?  Also I think that the default SA setting of
> 200 spam and 200 ham is a little low and do not regard bayes as truly
> effective until about 1000 message of each kind are learned.  That being
> said I would, and have, reduced the default score for Botnet from 5.0 to
> 3.0.  Also, if your run the 00_ version of Fred's rules note that many
> of them are very aggressively scored.  I personally do not let any rule
> score at over 3.0, except some network test, to allow bayes to recover
> the mail from listing as a FP.
> 

Thanks, we are rebuilding bayes and now have in SQL with auto learn on,
is that good? Now has over 25K spam, but just 180 ham. I have plenty of
ham on my own, is it going to effect it all coming from just a few
different addresses if I learn all my own ham?

-- 
Robert



Re: Fighting ham

2007-04-18 Thread Faisal N Jawdat

On Apr 18, 2007, at 4:26 PM, Robert Fitzpatrick wrote:
Thanks, we are rebuilding bayes and now have in SQL with auto learn  
on, is that good? Now has over 25K spam, but just 180 ham.


You *really* want to train with more ham than spam.

-faisal





Re: Fighting ham

2007-04-19 Thread Duane Hill

On Wed, 18 Apr 2007, Faisal N Jawdat wrote:


On Apr 18, 2007, at 4:26 PM, Robert Fitzpatrick wrote:
Thanks, we are rebuilding bayes and now have in SQL with auto learn on, is 
that good? Now has over 25K spam, but just 180 ham.


You *really* want to train with more ham than spam.


I have a hard time believing auto learn could be so off-balance. I had 
auto learn turned on here once and the two were usually within 200-300 
messages. Before I turned auto learn off, the bayes_token table had over 
85 million records in just over three weeks. We ended up letting our 
customers choose whether they wanted to use auto learn or not through 
using the sasql plugin for SquirrelMail.


Re: Fighting ham

2007-04-19 Thread Craig Carriere
Does this really mean that auto-learn is "out of balance"?  My first
guess is that this site probably relies only on SA to combat spam and
does little at the MTA level to reject UBE mail.  They may even run a
catch-all account which would markedly increase his spam count if he is
not rejecting for non-existent users.  At my small mail server even with
MTA restrictions, conservative ones, in place our spam hits out number
ham by probably 4-5 to 1.  It is just the nature of the beast.  I do
agree that he needs to manually train his bayes bases and probably keep
feeding ham into the bayes engine. after it starts to fire.

As an aside do you use any MTA restrictions and/or greylisting?

Best

Duane Hill wrote:
> On Wed, 18 Apr 2007, Faisal N Jawdat wrote:
>
>> On Apr 18, 2007, at 4:26 PM, Robert Fitzpatrick wrote:
>>> Thanks, we are rebuilding bayes and now have in SQL with auto learn
>>> on, is that good? Now has over 25K spam, but just 180 ham.
>>
>> You *really* want to train with more ham than spam.
>
> I have a hard time believing auto learn could be so off-balance. I had
> auto learn turned on here once and the two were usually within 200-300
> messages. Before I turned auto learn off, the bayes_token table had
> over 85 million records in just over three weeks. We ended up letting
> our customers choose whether they wanted to use auto learn or not
> through using the sasql plugin for SquirrelMail.
>


Re: Fighting ham

2007-04-19 Thread Duane Hill

On Thu, 19 Apr 2007, Craig Carriere wrote:


Does this really mean that auto-learn is "out of balance"?  My first
guess is that this site probably relies only on SA to combat spam and
does little at the MTA level to reject UBE mail.  They may even run a
catch-all account which would markedly increase his spam count if he is
not rejecting for non-existent users.  At my small mail server even with
MTA restrictions, conservative ones, in place our spam hits out number
ham by probably 4-5 to 1.  It is just the nature of the beast.  I do
agree that he needs to manually train his bayes bases and probably keep
feeding ham into the bayes engine. after it starts to fire.

As an aside do you use any MTA restrictions and/or greylisting?


I'm using Postfix+ClamAV+SA on our two border filter servers. Roughly 95% 
of all inbound is messages are weeded out before getting to our internal 
server our customers use.


I have a couple internal blacklists used and greylisting. And, I had set 
the following values within the local.cf:


  bayes_auto_learn_threshold_nonspam 0.01
  bayes_auto_learn_threshold_spam 18.0

Their normal defaults are 0.1 and 12.0 respectively. I had set a higher 
value for auto learn as you don't have hardly any control over what 
messages get learned in either direction. Some others on this list have 
the auto learn values set even higher.


As far as the numbers mentioned by the OP, 25,000 spam to 180 ham? That is 
a lot more than your ~5 to 1. I would not have suspected auto learn to be 
that far off.