Re: Public SA Corpus

2004-10-12 Thread Thomas Bolioli
Gerry Doris wrote:
I managed to destroy my bayes database...don't ask.
Since I only run a home system and don't receive a heavy flow of spam I
really like to skip the wait for bayes to get up to speed.  Is it
recommended to use the public corpus on the SA website or is it too old
for proper training?  Is there a better source of ham/spam to be used for
training?
Gerry
 

The public spam db should be broad enough for you in the interim,
although I just checked and it is a little long in the tooth (circa
2/2003). Spam is in large part generic these days, public/generic could
get you up and going quick. As time goes by, the older spam will be
retired and be replaced with things coming in. Don't bother with public
ham though. Feeding it ham should be up to you. If you get that little
spam, then you should have no problem training it on that side.
On a side note, I have a 55K message spam database from email addresses
used in the music industry, environmental and educational markets (not
to mention /. ;-}) and should be a broad reach. It has been culled of
all virii and mailing list mail. It could make a decent analysis corpus
for those who want it. Also gerry, If you want, I can forward along or
post the most recent spam, about 2-5K worth for you to train on. That
should be all you need.
Tom



Public SA Corpus

2004-10-11 Thread Gerry Doris
I managed to destroy my bayes database...don't ask.

Since I only run a home system and don't receive a heavy flow of spam I
really like to skip the wait for bayes to get up to speed.  Is it
recommended to use the public corpus on the SA website or is it too old
for proper training?  Is there a better source of ham/spam to be used for
training?


Gerry