http://bugzilla.spamassassin.org/show_bug.cgi?id=4467

           Summary: investigate setting BAYES_ scores manually instead of
                    via perceptron
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Score Generation
        AssignedTo: [email protected]
        ReportedBy: [EMAIL PROTECTED]


It's been a pretty solid FAQ during SpamAssassin 3.0.0's release timeframe, that
BAYES_99 was scored too low. e.g.:

  http://permalink.gmane.org/gmane.mail.spam.spamassassin.general/60217
  http://readlist.com/lists/incubator.apache.org/spamassassin-users/0/1500.html

On top of that, the scores for the BAYES_* rules are wholly dependent on
external factors that cannot be measured effectively through mass-checks to
match all environments.  For example, these setups have radically different
amounts of accurate training:

  - a site-wide autolearning system
  - a personalised, extensively hand-trained system with over 10000 mails of
each type
  - a system that has received the bare minimum "200 of each" training, with a
little autolearning on top
  - mass-check, with the new sampling method

As a result, I suspect that the Perceptron is going to generate scores that are
over-optimized for mass-check only, and under-optimized for the other end-user
setups.  To avoid this, I suggest that we set the BAYES_* scores manually, by
setting them as "userconf" rules.

comments/votes please.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to