Re: [Dspam-user] Spam Identification Deteriorates with Time

Kenneth Marshall Fri, 20 Mar 2009 06:14:06 -0700

On Thu, Mar 19, 2009 at 04:02:28PM -0500, Jeffrey Taylor wrote:
> Quoting Jaye Mathisen <[email protected]>:
> > I have seen this as well, and I don't know what causes it.
> > 
> > A change of training mode might help, but after a while I gave up, moved my
> > personal account to google, and voila problem solved.
> > 
> > However, I have a customer that uses it very heavily, it appears set up
> > similar to my own (I set it up), and yet she is happy as a clam, and it runs
> > just fine, w/o the hiccup.  SO I don't know the answer.
> > 
> > On Thu, Mar 19, 2009 at 8:04 AM, Paul Kauf <[email protected]> wrote:
> > 
> > > I have been running dspam for my home domain for about a year now.  Using
> > > dspam with mysql, along with postfix.  After initial training, spam
> > > identification is great - from 93 - 97% accurate.  After a period of time,
> > > maybe 2 - 4 months, accuracy drops dramatically, as if "new" spam is not
> > > being recognized (even after reclassifying spam misses via the web
> > > interface).  It still finds a lot of the spam, but its like the NEW spam
> > > gets by.  I have then wiped the database, gone through retraining, and 
> > > then
> > > all is well for a period of time again.
> > >
> > > I'm unsure what is happening, and would rather not have to reset 
> > > everything
> > > every couple months.  I do have the purge-4.1.sql running in cron daily to
> > > purge old signatures, etc.
> > >
> > > Does anyone have any ideas why this occurs, or am I alone in seeing this
> > > problem?  Let me know if I need to provide more information (cfg files,
> > > etc.)
> > >
> 
> I have seen this as well.  I figure it is an arms race.  The spammers are
> intelligent, motivated people with access to the same tools we use.  So they
> are constanting seeking ways around SpamAssassin, DSPAM and other Paul Graham
> style Bayesian classification schemes, etc.
> 
> Just my $0.02USD,
>   Jeffrey
> 
When we have seen this behavior, it was the result of wildly disparate
amounts of spam received versus notspam received and the training method
chosen. In particular, using TEFT (train on everything) results in many
more tokens on one side. Baysian filters work best with equal amounts of
training spam versus notspam. For these scenarios, we have the client
start from scratch and then use TOE (train on error) to prevent the
unbalanced buildup of tokens. I do not know if this matches your setup,
but if so, give it a try.


Cheers,
Ken


------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Re: [Dspam-user] Spam Identification Deteriorates with Time

Reply via email to