Apologies in advance for along message, but I'm a new user of Dspam, and
I'm trying to figure the best way to use it.
I have recently just switched my mail server to using DSpam with the
Postgresql backend. I'm using one shared group for all users so that
people who don't bother training will still get the benefits.
My retraining method uses the antispam plugin for Dovecot. Basically,
whenever someone moves a message to the Junk folder, the plugin mails a
copy to spam@, which looks like this:
spam:"|/usr/bin/dspam --user=dspam --class=spam --source=error"
The issue I have is that dpsam seems to need several seconds to
reprocess each one of these. When everyone gets going in the morning
and filters all of the previous days spam, there gets to be a huge
backlog of these reprocesses to happen, and the load average on the
machine jumps to 3 or 4.
Here's the relevant stuff from dspam.conf
TrainingMode toe
TestConditionalTraining on
Feature whitelist
Feature tb=1
Algorithm graham burton
Tokenizer chain
PValue bcr
WebStats on
HashRecMax 1572869
HashAutoExtend on
HashMaxExtents 0
HashExtentSize 49157
HashPctIncrease 10
HashMaxSeek 10
HashConnectionCache 10
Notifications off
PurgeSignature off # Specified in purge.sql
PurgeNeutral 90
PurgeUnused off # Specified in purge.sql
PurgeHapaxes off # Specified in purge.sql
PurgeHits1S off # Specified in purge.sql
PurgeHits1I off # Specified in purge.sql
LocalMX 127.0.0.1
SystemLog on
UserLog on
TrainPristine off
ParseToHeaders on
ServerPID /var/run/dspam/dspam.pid
ServerMode auto
ServerParameters "--deliver=innocent,spam"
ServerIdent "localhost.localdomain"
ProcessorURLContext on
ProcessorBias on
Now, the other way I have though of to manage this would be to set up an
actual mailbox to receive spam@ and one to receive [EMAIL PROTECTED] Then,
every night, I could run
dspam_train <shareduser> /var/spool/mail/spambox/new
/var/spool/mail/notspambox/new
Would this be an appropriate way to offload this processing to a low
usage time?
The one other thing, is that even after a couple weeks of using dspam
and training it, it's still not catching much. Should I change my
algorithm? With the load being so high already, I don't know that I
want to change to a more heavy duty tokenizer.
So, after all that, does anyone have any hints?
--
Alex Thurlow
Technical Director
Blastro Networks
email: [EMAIL PROTECTED]