https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8094
Bug ID: 8094
Summary: Non balanced bayes ratio in db makes the accuracy
plummet
Product: Spamassassin
Version: 4.0.0
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Learner
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: Undefined
spamassassin-4.0.0-0.30.svn1903083
Does the SA Bayes implementation assume 50-50 ham-spam ratio?
We have been seeing poor accuracy on systems where ratio is not balanced, but
ranging between 97-3 and 83-17.
Is it possible to change that or make an alternative bayes implementation which
would consider also the probability according the db ratio of tokens?
Here's an example of such a system where ratio is not in balance.
$ sa-learn --dump magic
--
0.000 0 3 0 non-token data: bayes db version
0.000 0 6184682 0 non-token data: nspam
0.000 0 29523157 0 non-token data: nham
0.000 0 2225793 0 non-token data: ntokens
--
--
You are receiving this mail because:
You are the assignee for the bug.