https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5861





--- Comment #17 from Justin Mason <[email protected]>  2009-04-01 01:44:55 PST ---
(In reply to comment #14)
> I don't see how it's relevant, but no. It's from some US uni.
> 
> The point is that there probably should be some limit on how many tokens to 
> get
> from a header. If I learn that as spam, all ham mail containing those headers
> will be strongly biased to spam (an uneducated, but logical guess).

I think you're overestimating it's effects on the chi-square probability
combining algorithm; actually, there's a good chance those values won't skew it
much, assuming there are stronger tokens found elsewhere.

The only way to get a useful idea of what's really happening is to run a
10-fold cross validation run. 
http://wiki.apache.org/spamassassin/TenFoldCrossValidation


-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to