http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4787
------- Additional Comments From [EMAIL PROTECTED] 2006-07-05 15:42 ------- Created an attachment (id=3567) --> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=3567&action=view) proof of concept this patch implements token tracking to prevent issues where lots of ham/spam has been learned, but all the tokens have been expired for one, causing bayes to lean too far one way ie (BAYES_00 or BAYES_99 on all mail). it also implements ham:spam ratio restrictions, which will prevent the autolearner from learning too much ham when the ratio is high, and too much spam with the ratio is low. the proof of concept code only applies to the BayesStore/SQL.pm, so in order to test it, you'd need to be using bayes_store_module Mail::SpamAssassin::BayesStore::SQL since my box that i'm testing here learns alot of spam, and little ham, the token ratio is always on the bottom end of the min ratio. [12005] dbg: bayes: ham:spam token ratio (0.74:1), min ratio (0.75:1), max ratio (1.25:1) [12005] dbg: bayes: skip autolearn of spam because ham:spam token ratio (0.74) is less than min ratio (0.75) as you can see from the autolearn results, its skipped a bunch of spam learns today... # grep -c autolearn=ham spamd.log 652 # grep -c autolearn=spam spamd.log 859 # grep -c autolearn=unavailable spamd.log 5141 but thats because i've set my min/max ratios so close at 0.75-1.25. If you want to learn alot more spam, you could simply use 0.5-2.0 which is the default... or you could even lower that 0.5 to something like 0.25 if you want to learn up to 4x more spam than ham. realize that this code is not drop in ready, as it requires a couple SQL alters to track spam/ham token counts. ALTER TABLE bayes_vars ADD spam_token_count int(11) NOT NULL default '0' AFTER token_count; ALTER TABLE bayes_vars ADD ham_token_count int(11) NOT NULL default '0' AFTER spam_token_count; ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
