On Tuesday, May 10, 2005, 1:34:27 PM, Theo Dinter wrote: >> > On Mon, May 02, 2005 at 03:16:11PM -0700, Jeff Chan wrote: >> >> If so can you provide a before and after ham/spam summary as of >> >> say a week ago and now-ish? >> >> > For SpamAssassin, our last weekly run (does net checks) failed due to a >> > code >> > issue. Oops. In theory, the next weekly run (on Saturdays) should occur >> > and >> > we can compare it to the results from 2 weeks ago.
> Ok, the two run sizes are a bit different, but you can get general stats from > this I think: > Previous run (4/23): > OVERALL% SPAM% HAM% S/O RANK SCORE NAME > 182168 109007 73161 0.598 0.00 0.00 (all messages) > 12.750 21.3041 0.0055 1.000 0.99 0.00 URIBL_SC_SURBL > 36.463 60.9135 0.0328 0.999 0.98 0.00 URIBL_JP_SURBL > 9.809 16.3843 0.0109 0.999 0.97 0.00 URIBL_AB_SURBL > 36.982 61.7355 0.1011 0.998 0.89 0.00 URIBL_WS_SURBL > 38.506 64.2683 0.1203 0.998 0.87 0.00 URIBL_OB_SURBL > 0.211 0.3532 0.0000 1.000 0.66 0.00 URIBL_PH_SURBL > Latest run (5/8): > OVERALL% SPAM% HAM% S/O RANK SCORE NAME > 339239 240537 98702 0.709 0.00 0.00 (all messages) > 100.000 70.9049 29.0951 0.709 0.00 0.00 (all messages as %) > 13.111 18.4895 0.0020 1.000 0.98 0.00 URIBL_SC_SURBL > 37.333 52.6451 0.0172 1.000 0.98 0.00 URIBL_JP_SURBL > 8.836 12.4600 0.0041 1.000 0.97 0.00 URIBL_AB_SURBL > 38.140 53.7672 0.0567 0.999 0.91 0.00 URIBL_OB_SURBL > 40.770 57.4652 0.0841 0.999 0.87 0.00 URIBL_WS_SURBL > 0.215 0.3035 0.0000 1.000 0.61 0.00 URIBL_PH_SURBL Thanks. The differing corpora sizes makes it difficult to compare however. For example the 5/8 spam count is more than double, but the ham count is like 35% more. Therefore the percentages are not directly comparable. Assuming the percentages in the SPAM and HAM columns represent percentages of hits within those columns, then here are the HAM percentages multiplied by the ham count at the top of the column for the number of ham hits (counts) per list: 4/23 NAME ham hits? URIBL_SC_SURBL 4 URIBL_JP_SURBL 24 URIBL_AB_SURBL 8 URIBL_WS_SURBL 74 URIBL_OB_SURBL 88 URIBL_PH_SURBL 0 5/8 NAME URIBL_SC_SURBL 2 URIBL_JP_SURBL 17 URIBL_AB_SURBL 4 URIBL_WS_SURBL 83 URIBL_OB_SURBL 56 URIBL_PH_SURBL 0 (If my assumption is wrong, please let me know how to correct it.) On a 35% larger ham corpus, WS hit 84 hams versus 74 before. In a sense that's a step in the wrong direction, but the differing ham corpora make conclusions difficult. Jeff C.