On Wednesday, March 26, 2003 9:13:10 PM "Scott at HobbyLink Japan"
<[EMAIL PROTECTED]> wrote:

> Over the past month, I've noticed that SpamSieve's effectiveness at
> catching spam has dropped considerably.  I've been training it for
> nearly four months now, and it has many thousands of messages to work
> with.
> 
> Originally, it was catching about 90% of incoming spam.  Now that's
> down to around 60-70%.  On some days, I get as much spam in my inbox
> as the software catches.
> 
> I've heard reports that this is due to the nature of the filter
> concept that software like this uses.  I'm wondering if others are
> seeing the same trend, and if pruning the corpus or some other
> technique would be effective in returning it to its formally
> effective self!

Pruning won't improve accuracy; it's just for saving memory. I've seen
smaller degradations of accuracy when the corpus gets very large. I'm
not exactly sure why this is--it may be that spam is evolving or simply
that the corpus is being diluted. In any case, I would recommend
backing up your Corpus.plist file and then selecting and removing all
the words in the Corpus window. Then re-train using your recent spam
and good messages. (You may need to save up some spam first, if you
have been deleting it.) I did this in late January, and my accuracy
increased from 91.5% to 98.6%, even though the new corpus only had
about 1300 messages.

If you want to go over this in more detail, please e-mail me at
<[EMAIL PROTECTED]>.

-- 
Michael Tsai                                 <http://www.c-command.com>


Reply via email to