Apologies in advance for along message, but I'm a new user of Dspam, and I'm trying to figure the best way to use it.

I have recently just switched my mail server to using DSpam with the Postgresql backend. I'm using one shared group for all users so that people who don't bother training will still get the benefits.

My retraining method uses the antispam plugin for Dovecot. Basically, whenever someone moves a message to the Junk folder, the plugin mails a copy to spam@, which looks like this:

spam:"|/usr/bin/dspam --user=dspam --class=spam --source=error"

The issue I have is that dpsam seems to need several seconds to reprocess each one of these. When everyone gets going in the morning and filters all of the previous days spam, there gets to be a huge backlog of these reprocesses to happen, and the load average on the machine jumps to 3 or 4.

Here's the relevant stuff from dspam.conf


TrainingMode toe
TestConditionalTraining on
Feature whitelist
Feature tb=1
Algorithm graham burton
Tokenizer chain
PValue bcr
WebStats on
HashRecMax              1572869
HashAutoExtend          on
HashMaxExtents          0
HashExtentSize          49157
HashPctIncrease 10
HashMaxSeek             10
HashConnectionCache     10
Notifications   off
PurgeSignature  off # Specified in purge.sql
PurgeNeutral   90
PurgeUnused    off # Specified in purge.sql
PurgeHapaxes   off # Specified in purge.sql
PurgeHits1S    off # Specified in purge.sql
PurgeHits1I    off # Specified in purge.sql
LocalMX 127.0.0.1
SystemLog on
UserLog   on
TrainPristine off
ParseToHeaders on
ServerPID              /var/run/dspam/dspam.pid
ServerMode auto
ServerParameters        "--deliver=innocent,spam"
ServerIdent             "localhost.localdomain"
ProcessorURLContext on
ProcessorBias on



Now, the other way I have though of to manage this would be to set up an actual mailbox to receive spam@ and one to receive [EMAIL PROTECTED] Then, every night, I could run

dspam_train <shareduser> /var/spool/mail/spambox/new /var/spool/mail/notspambox/new

Would this be an appropriate way to offload this processing to a low usage time?


The one other thing, is that even after a couple weeks of using dspam and training it, it's still not catching much. Should I change my algorithm? With the load being so high already, I don't know that I want to change to a more heavy duty tokenizer.

So, after all that, does anyone have any hints?


--
Alex Thurlow
Technical Director
Blastro Networks

email: [EMAIL PROTECTED]


Reply via email to