https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6247

--- Comment #54 from Justin Mason <[email protected]> 2009-12-17 09:38:24 UTC ---
Thanks for measuring this, Warren.

(In reply to comment #51)
> rescore masscheck logs with mcsnapshot + today's trunk Scores
> =======================================================
> # SUMMARY for threshold 5.0:
> # Correctly non-spam: 703979  99.95%
> # Correctly spam:     2562432  98.39%
> # False positives:       387  0.05%
> # False negatives:     41888  1.61%
> # TCR(l=50): 42.527842  SpamRecall: 98.392%  SpamPrec: 99.985%

vs.

> rescore masscheck logs with mcsnapshot + today's trunk Scores
> HABEAS, BSP and SSC and DNSWL Disabled
> ==========================
> # SUMMARY for threshold 5.0:
> # Correctly non-spam: 703899  99.93%
> # Correctly spam:     2565063  98.49%
> # False positives:       467  0.07%
> # False negatives:     39257  1.51%
> # TCR(l=50): 41.597904  SpamRecall: 98.493%  SpamPrec: 99.982%

So that means that 

467-387
--> 80    
80/(703899+467)
--> 0.000113577316338381  

    => 0.01135% of hams were rescued from being FPs by the DNSWL rules.

41888-39257
--> 2631                
2631/(2565063+39257)
--> 0.00101024451680285  

    => 0.101% of spams were, conversely, allowed through by them.


Good to know, and good to get an idea of the problem.  fwiw, I think it's
better to use the rescore data for this measurement -- more contributors, more
varied logs, and (hopefully) the data will have received more hand-checking
before submission.

by the way I don't know if it's safe to say whether or not this is
"statistically significant".  We don't know what the null hypothesis is in this
case to use that terminology.

--j.

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to