On Wednesday, March 26, 2003 9:13:10 PM "Scott at HobbyLink Japan" <[EMAIL PROTECTED]> wrote:
> Over the past month, I've noticed that SpamSieve's effectiveness at > catching spam has dropped considerably. I've been training it for > nearly four months now, and it has many thousands of messages to work > with. > > Originally, it was catching about 90% of incoming spam. Now that's > down to around 60-70%. On some days, I get as much spam in my inbox > as the software catches. > > I've heard reports that this is due to the nature of the filter > concept that software like this uses. I'm wondering if others are > seeing the same trend, and if pruning the corpus or some other > technique would be effective in returning it to its formally > effective self! Pruning won't improve accuracy; it's just for saving memory. I've seen smaller degradations of accuracy when the corpus gets very large. I'm not exactly sure why this is--it may be that spam is evolving or simply that the corpus is being diluted. In any case, I would recommend backing up your Corpus.plist file and then selecting and removing all the words in the Corpus window. Then re-train using your recent spam and good messages. (You may need to save up some spam first, if you have been deleting it.) I did this in late January, and my accuracy increased from 91.5% to 98.6%, even though the new corpus only had about 1300 messages. If you want to go over this in more detail, please e-mail me at <[EMAIL PROTECTED]>. -- Michael Tsai <http://www.c-command.com>