On 9/25/07, David <[EMAIL PROTECTED]> wrote:
> One feature that may be useful would be to let ASSP automatically scale
> how much spam/nonspam it collects, based on a couple of factors. It is
> often too easy to let the bayesian database get skewed to one side
> (usually heavy on the spam side) due to imbalanced collecting (such as
> 1:1, with a 80% spam rate).
>
> Perhaps ASSP could look at the rebuildrun.txt, see the value of the
> weighted norm then decide if it needs to adjust the collecting in one
> direction or another. Then it would also look at the Non-Local Mail
> Blocked (or another spam ratio indicator) to see how far it needs to
> skew the collecting (1:2, 1:4, I have close to 90% spam so I've been
> using 1:10 to get my corpus norm down from 3.5)
>
> For cases like mine where the corpus was heavily skewed, it would need
> to push the ratio even further (1:15, 1:20) then level out once the norm
> nears 1.0
>
> any thoughts?

I think this is a interesting idea.  I have thought about this before,
and have pondered if some simple mathmatics could be applied to the
rebuildspamdb.pl:  1) check to see what the ratio of spam/ham is, and
2) automagically adjust the freqNonSpam and/or freqSpam accordingly to
compensate for severely skewed ratios.

-- 
ME2

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Assp-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-user

Reply via email to