http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4100





------- Additional Comments From [EMAIL PROTECTED]  2007-04-19 04:22 -------
well, that *really* didn't improve the output I'm afraid ;)
Given these logs:

ham.log:
Y  1 /file2 B,D time=1157091898,scantime=6,format=m,reuse=yes
Y  1 /file2 D time=1157091898,scantime=6,format=m,reuse=yes
Y  1 /file2 D time=1157091898,scantime=6,format=m,reuse=yes
Y  1 /file2 D time=1157091898,scantime=6,format=m,reuse=yes
Y  1 /file2 D time=1157091898,scantime=6,format=m,reuse=yes

spam.log:
Y  1 /file1 A,B,C time=1157091898,scantime=6,format=m,reuse=yes
Y  1 /file1 A,B,C time=1157091898,scantime=6,format=m,reuse=yes
Y  1 /file1 A,B,C time=1157091898,scantime=6,format=m,reuse=yes
Y  1 /file1 A,B time=1157091898,scantime=6,format=m,reuse=yes
Y  1 /file1 A,B time=1157091898,scantime=6,format=m,reuse=yes




Here's what the old (non-IG) rank measure output (./hit-frequencies -x -p 
-c=/dev/null):

OVERALL    SPAM%     HAM%     S/O    RANK   SCORE  NAME
      0        5        5    0.500   0.00    0.00  (all messages)
0.00000  50.0000  50.0000    0.500   0.00    0.00  (all messages as %)
 50.000  100.0000   0.0000    1.000   1.00    0.00  A
 30.000  60.0000   0.0000    1.000   0.75    0.00  C
 60.000  100.0000  20.0000    0.833   0.75    0.00  B
 50.000   0.0000  100.0000    0.000   0.00    0.00  D

as you can see, A is the best rule so should be at the top.  B should
probably be higher than C.  D should be last (since it's a very good ham
rule).
old IG output (./hit-frequencies -x -p -i -c=/dev/null):

OVERALL    SPAM%     HAM%     S/O      IG   SCORE  NAME
      0        5        5    0.500   0.00    0.00  (all messages)
0.00000  50.0000  50.0000    0.500   0.00    0.00  (all messages as %)
 50.000  100.0000   0.0000    1.000   1.00    0.00  A
 50.000   0.0000  100.0000    0.000   1.00    0.00  D
 60.000  100.0000  20.0000    0.833   0.35    0.00  B
 30.000  60.0000   0.0000    1.000   0.00    0.00  C

not bad -- there's a bug in that D is treated as equally good as A;
really, it should be at the end.  but B is listed higher than C.

new algorithm (same cmdline):

OVERALL    SPAM%     HAM%     S/O      IG   SCORE  NAME
      0        5        5    0.500   0.00    0.00  (all messages)
0.00000  50.0000  50.0000    0.500   0.00    0.00  (all messages as %)
 50.000  100.0000   0.0000    1.000   1.00    0.00  A
 30.000  60.0000   0.0000    1.000   0.60    0.00  C
 50.000   0.0000  100.0000    0.000   0.00    0.00  D
 60.000  100.0000  20.0000    0.833   0.00    0.00  B

there's a problem here in that B is listed last.  not sure why...
I'll attach the patch to hit-frequencies if anyone wants to have 
a look.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to