ham source for site-wide bayes?
I've set up spamassassin with a site-wide bayes configuration. I have some spamtrap email addresses that supply fresh spam into bayes for training on a cron job. However, from what I've read, bayes needs to have ongoing ham as well as spam for training in order to work well. What's the usual method of supplying the ham? Does that have to be done manually (how often?) or has anyone come up with a way to automatically supply ham. I have the spamtrap email boxes that receive spam-only but all the real email addresses on the server receive a mix of ham and spam, which is why I need spamassassin in the first place :) I can't find anything in spamassassin docs so far that explains a non-manual way of supplying ham. Have I missed something? Is there some sort of service where I can subscribe to an updated ham corpus automatically like with the clamav database? -Steve
Re: ham source for site-wide bayes?
On 5/20/2015 12:29 PM, Steve Rainwater wrote: I've set up spamassassin with a site-wide bayes configuration. I have some spamtrap email addresses that supply fresh spam into bayes for training on a cron job. However, from what I've read, bayes needs to have ongoing ham as well as spam for training in order to work well. What's the usual method of supplying the ham? Does that have to be done manually (how often?) or has anyone come up with a way to automatically supply ham. I have the spamtrap email boxes that receive spam-only but all the real email addresses on the server receive a mix of ham and spam, which is why I need spamassassin in the first place :) I can't find anything in spamassassin docs so far that explains a non-manual way of supplying ham. Have I missed something? Is there some sort of service where I can subscribe to an updated ham corpus automatically like with the clamav database? One way people often supply ham is to use sent items from your legit users. Regards, KAM
Re: ham source for site-wide bayes?
On 20.05.2015 18:29, Steve Rainwater wrote: I've set up spamassassin with a site-wide bayes configuration. I have some spamtrap email addresses that supply fresh spam into bayes for training on a cron job. However, from what I've read, bayes needs to have ongoing ham as well as spam for training in order to work well. What's the usual method of supplying the ham? Does that have to be done manually (how often?) it doesn't have to be done - you *can* do it manually. or has anyone come up with a way to automaticallysupply ham. it's called auto_learn [works for me] you'll find all the details in https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.txt LEARNING OPTIONS I have the spamtrap email boxes that receive spam-only but all the real email addresses on the server receive a mix of ham and spam, which is why I need spamassassin in the first place :) I can't find anything in spamassassin docs so far that explains a non-manual way of supplying ham. Have I missed something? Is there some sort of service where I can subscribe to an updated ham corpus automatically like with the clamav database? your ham is specific to your traffic - you cannot inherit somebody else's ham and expect it to work nicely with you traffic. You'll soon read a dozen of ways to do it. I'll add mine: I use autolearn AND feed bayes trap data to a 6GB Redis DB [works for] Axb