http://bugzilla.spamassassin.org/show_bug.cgi?id=4505





------- Additional Comments From [EMAIL PROTECTED]  2005-08-06 15:56 -------
OK, I got hold of the logs from Henry, and measured some BAYES scores
against the validation set:

base results from comment 28, gen-set3-2.0-5.0-100-nobob:
# Correctly non-spam:  53070  99.96%
# Correctly spam:     121906  98.49%
# False positives:        21  0.04%
# False negatives:      1872  1.51%
# TCR(l=50): 42.360712  SpamRecall: 98.488%  SpamPrec: 99.983%

copying values from set 2 for set 3:
# Correctly non-spam:  53064  99.95%
# Correctly spam:     122453  98.93%
# False positives:        27  0.05%
# False negatives:      1325  1.07%
# TCR(l=50): 46.272150  SpamRecall: 98.930%  SpamPrec: 99.978%

comment 14:
# Correctly non-spam:  53014  99.85%
# Correctly spam:     123093  99.45%
# False positives:        77  0.15%
# False negatives:       685  0.55%
# TCR(l=50): 27.293936  SpamRecall: 99.447%  SpamPrec: 99.937%

comment 42 (the patch in attachment 3051):
# Correctly non-spam:  53068  99.96%
# Correctly spam:     122509  98.97%
# False positives:        23  0.04%
# False negatives:      1269  1.03%
# TCR(l=50): 51.169078  SpamRecall: 98.975%  SpamPrec: 99.981%

I think 3051 has the best scores.  less FNs, just 2 more FPs,
sane scores.   I'd suggest we just vote on that patch.

If you want to try other values btw -- the logs are in the zone.  do this:

  cd svncheckout/masses
  rm ham.log spam.log
  ln -s
/home/corpus-rsync/corpus/scoregen-3.1/gen-set3-2.0-5.0-100-nobob/NSBASE/ham-test.log
ham.log
  ln -s
/home/corpus-rsync/corpus/scoregen-3.1/gen-set3-2.0-5.0-100-nobob/SPBASE/spam-test.log
spam.log
  vi ../rules/50_scores.cf
  ./fp-fn-statistics --scoreset=3




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to