-------- Original-Nachricht --------
> Datum: Mon, 28 Jan 2008 20:19:12 -0500 (EST)
> Von: "Robbie" <[EMAIL PROTECTED]>
> An: "Steve" <[EMAIL PROTECTED]>
> CC: [email protected]
> Betreff: Re: [dspam-users] idea for a good start?

> is using spam/ham corpi a good idea?
> 
It depends. I use it. But I don't just do normal training with dspam_train. I 
do TONE training with DSPAM without any whitelisting (but I use whitelisting in 
production) etc... If you don't know what TONE training is, then lookup the 
term in Google.

Anyway... I use DSPAM for very long time and I am used to the group 
functionality found in DSPAM. This is where I started to do pre-training with 
spam/ham corpi.

But with the recently added tokenizer and algorithms in DSPAM the use of 
pre-training is not really needed. You get probably even better results without 
using pre-training. Not that pre-training is something bad but mostly doing 
pre-training involves that you relay on external spam/ham and you know how it 
is with external data: They don't match always with what you would classify as 
ham/spam. So you are basically negatively influencing your statistical data. 
And the more you do that (more training) the harder it is later to switch wrong 
classified tokens.

But! If you use to have a lot of users on your DSPAM installation AND those 
users are not likely to do training, then pre-training is a good way to get 
acceptable results for those users. But you have to keep your pre-trained data 
up to date. It is important to inject/inoculate spam/ham mails from time to 
time so that the data stays fresh.


// SteveB

> 
> On Mon, January 28, 2008 5:55 pm, Steve wrote:
> > -------- Original-Nachricht --------
> >> Datum: Mon, 28 Jan 2008 16:40:49 -0500 (EST)
> >> Von: "Robbie" <[EMAIL PROTECTED]>
> >> An: [email protected]
> >> Betreff: [dspam-users] idea for a good start?
> >
> >> Is there a file that i can download to some how get a jump start d
> spams
> >> iq for
> >> todays current spam?
> >>
> >> back in the day when i deployed this there was a file (i think supplyed
> by
> >> spamassin) that you could inport into d spam that would help it
> classify
> >> spam
> >> right out of the box, rather then learning from scratch.
> >>
> > I am not aware of any direct data import into DSPAM. At least nothing
> which is
> > available on the net. DSPAM does not have just one single valid
> configuration.
> > DSPAM has many different ways of configuring the filter engine and at
> least some
> > of those configurations do affect the way the data is stored in the
> storage
> > facility. So one single importable dataset would not be enough.
> >
> >
> >> any idea's where i can find this?
> >>
> > I think what you are referring to are the various spam/ham corpi. Right?
> If this
> > is what you are looking for then have a look at those links:
> > http://spamassassin.apache.org/publiccorpus/
> > http://untroubled.org/spam/
> > http://darleyconsulting.com/www.annexia.org/spam/files/
> > http://www.dornbos.com/spam01.shtml
> > http://plg.uwaterloo.ca/~gvcormac/treccorpus06/
> > http://plg.uwaterloo.ca/~gvcormac/treccorpus/
> > http://www.cs.cmu.edu/~enron/
> > http://www.iit.demokritos.gr/skel/i-config/downloads/
> > http://wiki.cs.pdx.edu/~psam/CorpusSets
> >
> http://forgeftp.novell.com//nw-assp/Sample%20Spam%20Database/asspsmpl-0.1/
> > http://foxmail4u.goracer.de/dloads.html#smail
> > http://www.trudgian.net/spamkann/index.php#corpus
> >
> >
> > Let me know if you need more links.
> >
> >
> > // SteveB
> > --
> > GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
> > Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
> >
> 

-- 
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

Reply via email to