http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4100
------- Additional Comments From [EMAIL PROTECTED] 2007-04-19 04:22 -------
well, that *really* didn't improve the output I'm afraid ;)
Given these logs:
ham.log:
Y 1 /file2 B,D time=1157091898,scantime=6,format=m,reuse=yes
Y 1 /file2 D time=1157091898,scantime=6,format=m,reuse=yes
Y 1 /file2 D time=1157091898,scantime=6,format=m,reuse=yes
Y 1 /file2 D time=1157091898,scantime=6,format=m,reuse=yes
Y 1 /file2 D time=1157091898,scantime=6,format=m,reuse=yes
spam.log:
Y 1 /file1 A,B,C time=1157091898,scantime=6,format=m,reuse=yes
Y 1 /file1 A,B,C time=1157091898,scantime=6,format=m,reuse=yes
Y 1 /file1 A,B,C time=1157091898,scantime=6,format=m,reuse=yes
Y 1 /file1 A,B time=1157091898,scantime=6,format=m,reuse=yes
Y 1 /file1 A,B time=1157091898,scantime=6,format=m,reuse=yes
Here's what the old (non-IG) rank measure output (./hit-frequencies -x -p
-c=/dev/null):
OVERALL SPAM% HAM% S/O RANK SCORE NAME
0 5 5 0.500 0.00 0.00 (all messages)
0.00000 50.0000 50.0000 0.500 0.00 0.00 (all messages as %)
50.000 100.0000 0.0000 1.000 1.00 0.00 A
30.000 60.0000 0.0000 1.000 0.75 0.00 C
60.000 100.0000 20.0000 0.833 0.75 0.00 B
50.000 0.0000 100.0000 0.000 0.00 0.00 D
as you can see, A is the best rule so should be at the top. B should
probably be higher than C. D should be last (since it's a very good ham
rule).
old IG output (./hit-frequencies -x -p -i -c=/dev/null):
OVERALL SPAM% HAM% S/O IG SCORE NAME
0 5 5 0.500 0.00 0.00 (all messages)
0.00000 50.0000 50.0000 0.500 0.00 0.00 (all messages as %)
50.000 100.0000 0.0000 1.000 1.00 0.00 A
50.000 0.0000 100.0000 0.000 1.00 0.00 D
60.000 100.0000 20.0000 0.833 0.35 0.00 B
30.000 60.0000 0.0000 1.000 0.00 0.00 C
not bad -- there's a bug in that D is treated as equally good as A;
really, it should be at the end. but B is listed higher than C.
new algorithm (same cmdline):
OVERALL SPAM% HAM% S/O IG SCORE NAME
0 5 5 0.500 0.00 0.00 (all messages)
0.00000 50.0000 50.0000 0.500 0.00 0.00 (all messages as %)
50.000 100.0000 0.0000 1.000 1.00 0.00 A
30.000 60.0000 0.0000 1.000 0.60 0.00 C
50.000 0.0000 100.0000 0.000 0.00 0.00 D
60.000 100.0000 20.0000 0.833 0.00 0.00 B
there's a problem here in that B is listed last. not sure why...
I'll attach the patch to hit-frequencies if anyone wants to have
a look.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.