-------- Original-Nachricht -------- > Datum: Mon, 28 Jan 2008 20:19:12 -0500 (EST) > Von: "Robbie" <[EMAIL PROTECTED]> > An: "Steve" <[EMAIL PROTECTED]> > CC: [email protected] > Betreff: Re: [dspam-users] idea for a good start?
> is using spam/ham corpi a good idea? > It depends. I use it. But I don't just do normal training with dspam_train. I do TONE training with DSPAM without any whitelisting (but I use whitelisting in production) etc... If you don't know what TONE training is, then lookup the term in Google. Anyway... I use DSPAM for very long time and I am used to the group functionality found in DSPAM. This is where I started to do pre-training with spam/ham corpi. But with the recently added tokenizer and algorithms in DSPAM the use of pre-training is not really needed. You get probably even better results without using pre-training. Not that pre-training is something bad but mostly doing pre-training involves that you relay on external spam/ham and you know how it is with external data: They don't match always with what you would classify as ham/spam. So you are basically negatively influencing your statistical data. And the more you do that (more training) the harder it is later to switch wrong classified tokens. But! If you use to have a lot of users on your DSPAM installation AND those users are not likely to do training, then pre-training is a good way to get acceptable results for those users. But you have to keep your pre-trained data up to date. It is important to inject/inoculate spam/ham mails from time to time so that the data stays fresh. // SteveB > > On Mon, January 28, 2008 5:55 pm, Steve wrote: > > -------- Original-Nachricht -------- > >> Datum: Mon, 28 Jan 2008 16:40:49 -0500 (EST) > >> Von: "Robbie" <[EMAIL PROTECTED]> > >> An: [email protected] > >> Betreff: [dspam-users] idea for a good start? > > > >> Is there a file that i can download to some how get a jump start d > spams > >> iq for > >> todays current spam? > >> > >> back in the day when i deployed this there was a file (i think supplyed > by > >> spamassin) that you could inport into d spam that would help it > classify > >> spam > >> right out of the box, rather then learning from scratch. > >> > > I am not aware of any direct data import into DSPAM. At least nothing > which is > > available on the net. DSPAM does not have just one single valid > configuration. > > DSPAM has many different ways of configuring the filter engine and at > least some > > of those configurations do affect the way the data is stored in the > storage > > facility. So one single importable dataset would not be enough. > > > > > >> any idea's where i can find this? > >> > > I think what you are referring to are the various spam/ham corpi. Right? > If this > > is what you are looking for then have a look at those links: > > http://spamassassin.apache.org/publiccorpus/ > > http://untroubled.org/spam/ > > http://darleyconsulting.com/www.annexia.org/spam/files/ > > http://www.dornbos.com/spam01.shtml > > http://plg.uwaterloo.ca/~gvcormac/treccorpus06/ > > http://plg.uwaterloo.ca/~gvcormac/treccorpus/ > > http://www.cs.cmu.edu/~enron/ > > http://www.iit.demokritos.gr/skel/i-config/downloads/ > > http://wiki.cs.pdx.edu/~psam/CorpusSets > > > http://forgeftp.novell.com//nw-assp/Sample%20Spam%20Database/asspsmpl-0.1/ > > http://foxmail4u.goracer.de/dloads.html#smail > > http://www.trudgian.net/spamkann/index.php#corpus > > > > > > Let me know if you need more links. > > > > > > // SteveB > > -- > > GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. > > Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail > > > -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
