Re: Bayes auto-learning a bad idea?
On 28.09.11 10:07, Lars Jørgensen wrote: Not sure if this is the correct forum, but google couldn't help me (or I am too low on caffeine). I get a lot of spam that would have been flagged as such, but a bayes score of -1.9 pulls it down to hammy status. I train Bayes manually on the borderline cases, but also have auto-learning enabled. Is that really a bad idea? Should I disable it, delete the bayes-databases and start over on manual-only learning? do you run manual learning? Keeping it only automatic learning can easily make things go wrong and let people think bayes is bad. If you re-train on those that misfired, you should get BAYES hitting properly soon. (Providing you didn't misconfigure on e.g. trusted_networks or internal_networks. That could break SA very "effectively"). -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Posli tento mail 100 svojim znamim - nech vidia aky si idiot Send this email to 100 your friends - let them see what an idiot you are
Re: Bayes auto-learning a bad idea?
On Wed, 28 Sep 2011 14:30:32 +0200 Lars Jørgensen wrote: > Looking at > http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.html#learning_options > > i see an option called "bayes_use_hapaxes" that promises > significantly better hit-rates, but also increases database size by a > factor of 8 to 10. I've never understood what this is supposed to mean, and I suspect it it's just plain wrong. bayes_use_hapaxes determines whether hapaxes (tokens with a total count of 1) are used in the calculation. It doesn't affect whether they are stored; and it can't since all tokens start-off as hapaxes. It might have a marginal effect through the updating of atimes, but in that case it's expediting the removal of the most useful hapaxes. > What is the recommendation on this? I'd leave it on.
Re: Bayes auto-learning a bad idea?
On Wed, 28 Sep 2011 14:30:32 +0200, Lars Jørgensen wrote: On 28-09-2011 13:20, Benny Pedersen wrote: I train Bayes manually on the borderline cases, but also have auto-learning enabled. Is that really a bad idea? Should I disable it, delete the bayes-databases and start over on manual-only learning? no training is always good Are you missing a comma? Do you mean "no, training is always good" or "no training is always good"? no just my bolsk algebra and english is bad :) what score are you learning on ?, default is -0.1 and 12.0, i have changed them here to -4 and 14 Can't find any settings to that effect, so I guess I am using defaults. I have entered your settings in my config now. perldoc Mail::SpamAssassin::Plugin::AutoLearnThreshold Looking at http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.html#learning_options i see an option called "bayes_use_hapaxes" that promises significantly better hit-rates, but also increases database size by a factor of 8 to 10. What is the recommendation on this? dont known for sure what is best there, using default here perldoc Mail::SpamAssassin::Plugin::Bayes perldoc Mail::SpamAssassin::Conf for 3.3.1 and above i add in local.cf bayes_auto_learn_on_error 1 reduce poising bayes and load If throughput is a factor in this decision, we are scanning about 60,000 to 90,000 mails a day. more then my server handle now what plugins have you enabled ? DCC pyzor/razor SpamCop AutoLearnThreshold TextCat MIMEHeader ReplaceTags DKIM Check HTTPSMismatch URIDetail Bayes All the EvalTest plugins VBounce ImageInfo FreeMail 3dr party rules or just default sa 3.3.2 ? Default and Sought Rules. should be safe enough to not give any problem to bayes tip if you like to restart learning bayes on can do this like here: sa-learn --dump magic bayes_min_ham_num (Default: 200) bayes_min_spam_num (Default: 200) and adjust this with 200 more then listed in dump magic, this ensure that bayes go back in learning mode
Re: Bayes auto-learning a bad idea?
On 28-09-2011 13:20, Benny Pedersen wrote: I train Bayes manually on the borderline cases, but also have auto-learning enabled. Is that really a bad idea? Should I disable it, delete the bayes-databases and start over on manual-only learning? no training is always good Are you missing a comma? Do you mean "no, training is always good" or "no training is always good"? what score are you learning on ?, default is -0.1 and 12.0, i have changed them here to -4 and 14 Can't find any settings to that effect, so I guess I am using defaults. I have entered your settings in my config now. Looking at http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.html#learning_options i see an option called "bayes_use_hapaxes" that promises significantly better hit-rates, but also increases database size by a factor of 8 to 10. What is the recommendation on this? If throughput is a factor in this decision, we are scanning about 60,000 to 90,000 mails a day. what plugins have you enabled ? DCC pyzor/razor SpamCop AutoLearnThreshold TextCat MIMEHeader ReplaceTags DKIM Check HTTPSMismatch URIDetail Bayes All the EvalTest plugins VBounce ImageInfo FreeMail 3dr party rules or just default sa 3.3.2 ? Default and Sought Rules. -- Lars
Re: Bayes auto-learning a bad idea?
On Wed, 28 Sep 2011 10:07:55 +0200, Lars Jørgensen wrote: Hi, Not sure if this is the correct forum, but google couldn't help me (or I am too low on caffeine). I get a lot of spam that would have been flagged as such, but a bayes score of -1.9 pulls it down to hammy status. I train Bayes manually on the borderline cases, but also have auto-learning enabled. Is that really a bad idea? Should I disable it, delete the bayes-databases and start over on manual-only learning? no training is always good, its more like that bayes is unsure thats the problem, when it autolearn it does it on whole content/headers, so the more heders/content there is scanning of the better bayes can track what you want as ham/spam what score are you learning on ?, default is -0.1 and 12.0, i have changed them here to -4 and 14 what plugins have you enabled ? 3dr party rules or just default sa 3.3.2 ?