On Tuesday, May 10, 2005, 1:34:27 PM, Theo Dinter wrote:
>> > On Mon, May 02, 2005 at 03:16:11PM -0700, Jeff Chan wrote:
>> >> If so can you provide a before and after ham/spam summary as of
>> >> say a week ago and now-ish?
>> 
>> > For SpamAssassin, our last weekly run (does net checks) failed due to a 
>> > code
>> > issue.  Oops.  In theory, the next weekly run (on Saturdays) should occur 
>> > and
>> > we can compare it to the results from 2 weeks ago.

> Ok, the two run sizes are a bit different, but you can get general stats from
> this I think:

> Previous run (4/23):

> OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
>  182168   109007    73161    0.598   0.00    0.00  (all messages)
>  12.750  21.3041   0.0055    1.000   0.99    0.00  URIBL_SC_SURBL
>  36.463  60.9135   0.0328    0.999   0.98    0.00  URIBL_JP_SURBL
>   9.809  16.3843   0.0109    0.999   0.97    0.00  URIBL_AB_SURBL
>  36.982  61.7355   0.1011    0.998   0.89    0.00  URIBL_WS_SURBL
>  38.506  64.2683   0.1203    0.998   0.87    0.00  URIBL_OB_SURBL
>   0.211   0.3532   0.0000    1.000   0.66    0.00  URIBL_PH_SURBL

> Latest run (5/8):

> OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
>  339239   240537    98702    0.709   0.00    0.00  (all messages)
> 100.000  70.9049  29.0951    0.709   0.00    0.00  (all messages as %)
>  13.111  18.4895   0.0020    1.000   0.98    0.00  URIBL_SC_SURBL
>  37.333  52.6451   0.0172    1.000   0.98    0.00  URIBL_JP_SURBL
>   8.836  12.4600   0.0041    1.000   0.97    0.00  URIBL_AB_SURBL
>  38.140  53.7672   0.0567    0.999   0.91    0.00  URIBL_OB_SURBL
>  40.770  57.4652   0.0841    0.999   0.87    0.00  URIBL_WS_SURBL
>   0.215   0.3035   0.0000    1.000   0.61    0.00  URIBL_PH_SURBL

Thanks.  The differing corpora sizes makes it difficult to
compare however.  For example the 5/8 spam count is more than
double, but the ham count is like 35% more.  Therefore the
percentages are not directly comparable.

Assuming the percentages in the SPAM and HAM columns represent
percentages of hits within those columns, then here are the
HAM percentages multiplied by the ham count at the top of the
column for the number of ham hits (counts) per list:

4/23

NAME              ham hits?
URIBL_SC_SURBL           4
URIBL_JP_SURBL          24
URIBL_AB_SURBL           8
URIBL_WS_SURBL          74
URIBL_OB_SURBL          88
URIBL_PH_SURBL           0


5/8

NAME
URIBL_SC_SURBL           2
URIBL_JP_SURBL          17
URIBL_AB_SURBL           4
URIBL_WS_SURBL          83
URIBL_OB_SURBL          56
URIBL_PH_SURBL           0

(If my assumption is wrong, please let me know how to correct
it.)

On a 35% larger ham corpus, WS hit 84 hams versus 74 before.
In a sense that's a step in the wrong direction, but the
differing ham corpora make conclusions difficult.

Jeff C.

Reply via email to