ham source for site-wide bayes?

2015-05-20 Thread Steve Rainwater
I've set up spamassassin with a site-wide bayes configuration. I have
some spamtrap email addresses that supply fresh spam into bayes for
training on a cron job. However, from what I've read, bayes needs to
have ongoing ham as well as spam for training in order to work well.
What's the usual method of supplying the ham? Does that have to be done
manually (how often?) or has anyone come up with a way to automatically
supply ham. 

I have the spamtrap email boxes that receive spam-only but all the real
email addresses on the server receive a mix of ham and spam, which is
why I need spamassassin in the first place :)  I can't find anything in
spamassassin docs so far that explains a non-manual way of supplying
ham. Have I missed something? Is there some sort of service where I can
subscribe to an updated ham corpus automatically like with the clamav
database? 

-Steve




Re: ham source for site-wide bayes?

2015-05-20 Thread Kevin A. McGrail

On 5/20/2015 12:29 PM, Steve Rainwater wrote:

I've set up spamassassin with a site-wide bayes configuration. I have
some spamtrap email addresses that supply fresh spam into bayes for
training on a cron job. However, from what I've read, bayes needs to
have ongoing ham as well as spam for training in order to work well.
What's the usual method of supplying the ham? Does that have to be done
manually (how often?) or has anyone come up with a way to automatically
supply ham.

I have the spamtrap email boxes that receive spam-only but all the real
email addresses on the server receive a mix of ham and spam, which is
why I need spamassassin in the first place :)  I can't find anything in
spamassassin docs so far that explains a non-manual way of supplying
ham. Have I missed something? Is there some sort of service where I can
subscribe to an updated ham corpus automatically like with the clamav
database?

One way people often supply ham is to use sent items from your legit users.

Regards,
KAM


Re: ham source for site-wide bayes?

2015-05-20 Thread Axb

On 20.05.2015 18:29, Steve Rainwater wrote:

I've set up spamassassin with a site-wide bayes configuration. I have
some spamtrap email addresses that supply fresh spam into bayes for
training on a cron job. However, from what I've read, bayes needs to
have ongoing ham as well as spam for training in order to work well.
What's the usual method of supplying the ham? Does that have to be done
manually (how often?)


it doesn't have to be done - you *can* do it manually.


or has anyone come up with a way to automaticallysupply ham.


it's called auto_learn [works for me]

you'll find all the details in

https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.txt

LEARNING OPTIONS



I have the spamtrap email boxes that receive spam-only but all the real
email addresses on the server receive a mix of ham and spam, which is
why I need spamassassin in the first place :)  I can't find anything in
spamassassin docs so far that explains a non-manual way of supplying
ham. Have I missed something? Is there some sort of service where I can
subscribe to an updated ham corpus automatically like with the clamav
database?


your ham is specific to your traffic - you cannot inherit somebody 
else's ham and expect it to work nicely with you traffic.


You'll soon read a dozen of ways to do it.

I'll add mine: I use autolearn AND feed bayes trap data to a 6GB Redis 
DB [works for]


Axb