So assuming that

    @t=sort {abs($b-.5)<=>abs($a-.5)} @t;

is not working well ?

Thomas




Wim Borghs <wim.bor...@gmail.com> 
26.03.2009 12:31
Bitte antworten an
ASSP development mailing list <assp-test@lists.sourceforge.net>


An
ASSP development mailing list <assp-test@lists.sourceforge.net>
Kopie

Thema
[Assp-test] absurd misclassifications by bayesian check






We've been seeing some really absurd classifications done by the bayesian
check. Mails that are obviously not spam are deemed as spam by it and vice
versa.
I think it started when we switched to the version 2 branch.

After having looked into it, It seems to me that the sort in sub BayesOK
doesn't work correct in about half the cases. The sort in that sub is 
meant
to pick the most interesting bayesian scores for calculating the
spamprobability and bayesian confidence. So if the sort fails those
probability and confidence scores are based on random token-pairs instead 
of
the most interesting ones.

I do know this is absurd, ridiculous, whatever... and part of me feels I
must be making some mistake and I will feel shame if I discover what it
is...
After all, Perl is far from obscure, the ActiveState implementation is far
from obscure, assp isn't an obscure piece of software.
So if there would be a bug like this somewhere it would have been found 
and
fixed by now.
But nonetheless, this is what it seems like to me now.

I added this (blue) code to sub BayesOK to check if @t was sorted ok and
have some logging on the issue:
    $itime=time-$stime; mlog($fh,"info: Bayesian-Check has taken $itime
seconds") if $BayesianLog == 2;
    my $index;
    for ($index=0; $index<(@t-1); $index++) {
        if (abs($t[$index]-0.5) < abs($t[$index+1]-0.5)) {
            $this->{spamprob}=0;
            mlog($fh, "Sort-in-sub-BayesOK-incorrect at element 
$index!!!");
            last;
        }
    }
    if ($index == @t-1) {
        mlog($fh, "Sort-in-sub-BayesOK-correct");
    }
 return $this->{spamprob}=0 if $DoBayesian==2;

This should always log "Sort-in-sub-BayesOK-correct" I think.
But it logs incorrect in slightly more than half the cases.

Unfortunately we're very busy here so I don't have a lot of time to spend 
on
this.
Some things I thought about that I can check:
- Does the version 1 branch have the same issue?
- Do other uses of sort in assp show the same issue?

I tried if uninstalling and reinstalling
ActivePerl-5.10.0.1004-MSWin32-x86-287188.msi fixes it but it didn't.
I tried if ActivePerl v5.8 shows the same issue and it does.
------------------------------------------------------------------------------
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test




DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally 
privileged and protected in law and are intended solely for the use of the 

individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no 
known virus in this email!
*******************************************************

------------------------------------------------------------------------------
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to