One feature that may be useful would be to let ASSP automatically scale how much spam/nonspam it collects, based on a couple of factors. It is often too easy to let the bayesian database get skewed to one side (usually heavy on the spam side) due to imbalanced collecting (such as 1:1, with a 80% spam rate).
Perhaps ASSP could look at the rebuildrun.txt, see the value of the weighted norm then decide if it needs to adjust the collecting in one direction or another. Then it would also look at the Non-Local Mail Blocked (or another spam ratio indicator) to see how far it needs to skew the collecting (1:2, 1:4, I have close to 90% spam so I've been using 1:10 to get my corpus norm down from 3.5) For cases like mine where the corpus was heavily skewed, it would need to push the ratio even further (1:15, 1:20) then level out once the norm nears 1.0 any thoughts? ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Assp-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/assp-user
