Tony Earnshaw wrote:
> Erland Nylend wrote, on 20. mar 2007 14:41:
> 
> [...]
> 
>> I'm using one shared group, with the "global" user as parent, like
>> this: global:shared:*
>>
>> I've set the learning mode in dspam.conf to toe, and these are the
>> preferences for the users:
>>
>> | mysql> select * from dspam_preferences;
>> | +-----+--------------+-------+
>> | | uid | preference   | value |
>> | +-----+--------------+-------+
>> | |  11 | optin        | on    | | |  11 | trainingMode | toe   | | | 
>> 13 | optin        | on    | | +-----+--------------+-------+
>> | 3 rows in set (0.00 sec)
>>
>> (uid 11 is the global user, and the other one is the one I'm sending
>> ham/spam to)
>>
>> I've done some initial training of the global user, and dspam seems
>> to work as expected when sending mail to myself (uid 13). My problem
>> is that when dspam misses spam, and I want to notify dspam about the
>> errors, it does not work.
>>
>> This is the command I am using:
>> ~# dspam --user global --class=spam --source=error
>> --signature=11,45ffaeec41871548770753
>>
>> I see no change in dspam_stats, and I cannot see any improvement in
>> how dspam filters the spam messages, either.
>> Anyone on the list who could offer some tips?
> 
> [...]
> 
> Well, I might be able to.
> 
> 1: Please revise the recent thread on this list between Lars Stavholm
> ([EMAIL PROTECTED]) and myself, which dealt with exactly the same thing.
> 
> 2: Basically, if you use a *shared* group (which all my sites do), you
> can't initiate any other user than the user of the shared group itself,
> in your case user "global". So you can't expect any other user to be
> existent in the group, you can't have any other user in your setup than
> uid 11. If you do, it simply won't work, as you've found out for
> yourself. You can't submit data in the name of a user that doesn't exist.
> 
> You send mail to uid 13, fair enough, the user receives the mail - this
> is a factor of your MTA and (if you have one) your MLA. But dspam will
> not retrain under any other uid than that of the shared group.
> 
> If you can peruse the exchange between Lars and me on the same subject,
> then that would be the best. Otherwise, we can take it from step 1: again.

I posted my complete setup under another thread,
but for your convenience, here it is again:

Postfix -> DSPAM -> Cyrus IMAP

# dspam --version
DSPAM Anti-Spam Suite 3.6.8 (agent/library)
Copyright (c) 2002-2006 Jonathan A. Zdziarski
http://dspam.nuclearelephant.com
DSPAM may be copied only under the terms of the GNU General Public
License, a copy of which can be found with the DSPAM distribution kit.
Configuration parameters: --prefix=/usr --sysconfdir=/etc
--with-dspam-home=/var/lib/dspam --mandir=/usr/share/man --enable-daemon
--enable-debug --enable-clamav --enable-syslog --enable-homedir

# cat /var/lib/dspam/group
users:shared:[EMAIL PROTECTED]

# egrep -v '^#|^$' /etc/dspam.conf
Home /var/lib/dspam
TrustedDeliveryAgent "/usr/lib/cyrus/bin/deliver"
DeliveryHost        127.0.0.1
DeliveryPort        10026
DeliveryIdent       localhost
DeliveryProto       SMTP
OnFail error
Trust root
Trust mail
Trust dspam
Trust wwwrun
TrainingMode teft
TestConditionalTraining on
Feature noise
Feature chained
Feature whitelist
Algorithm graham burton
PValue graham
ImprobabilityDrive on
Preference "spamAction=deliver"
Preference "signatureLocation=headers"  # 'message' or 'headers'
Preference "showFactors=off"
AllowOverride trainingMode
AllowOverride spamAction
AllowOverride spamSubject
AllowOverride statisticalSedation
AllowOverride enableBNR
AllowOverride enableWhitelist
AllowOverride signatureLocation
AllowOverride showFactors
AllowOverride optIn optOut
AllowOverride whitelistThreshold
HashRecMax              98317
HashAutoExtend          on
HashMaxExtents          0
HashExtentSize          49157
HashMaxSeek             100
HashConnectionCache     10
Lookup  "rabl.nuclearelephant.com"
RBLInoculate on
Notifications   off
PurgeSignatures 14
PurgeNeutral    90
PurgeUnused     90
PurgeHapaxes    30
PurgeHits1S     15
PurgeHits1I     15
LocalMX 127.0.0.1
SystemLog on
UserLog   off
TrainPristine on
Opt out
Broken lineStripping
ClamAVPort      3310
ClamAVHost      127.0.0.1
ClamAVResponse  spam
ServerPID              /var/run/dspam.pid
ServerMode auto
ServerParameters        "--deliver=innocent,spam -d %u"
ServerIdent             "mail.domain.tld"
ServerDomainSocketPath  "/var/tmp/dspam.sock"
ClientHost      /var/tmp/dspam.sock
ProcessorBias on

With this setup, however, the webui doesn't work,
except for the global statistics page.

As you can see, we use the hash drive and shared groups,
works like a charm.

For user mail training we use a simple script that collects
misclassified ham/spam on an hourly basis from dedicated
user IMAP folders like so:

#!/bin/bash
# $Id: dspam_learn.sh.in 1971 2007-03-16 22:18:02Z stava $
# @(#) Look for user/$user/spam/{ham,train} and if all those directories
exists,
# @(#) and there's at least one mail message to learn from,
# @(#) perform the training and the subsequent cleanup (remove the mails).

id="`id | cut -d= -f2 | cut -d\( -f1`"
[ "$id" = "0" ] || { echo >&2 "$0: must be root"; exit 1; }

# look here for cyrus imap users...
basedir="/var/spool/imap/user"

# establish working directory...
cd /var/tmp

# loop through all users...
for u in $basedir/*; do
  user="`basename $u`"; ham=; spam=
  # if all user directories (folders) exists, and only then...
  [ -d $u/Spam ] && [ -d $u/Spam/train ] && \
  [ -d $u/Spam/train/ham ] && [ -d $u/Spam/train/spam ] && {
    ls $u/Spam/train/ham/[0-9]*. &> /dev/null && {
      echo -n "ham: "
      for mail in $u/Spam/train/ham/[0-9]*.; do
        echo -n "`basename $mail`"
        sed '/^X-DSPAM-/d' $mail | \
          dspam --user users --class=innocent --deliver=innocent
--source=error
        [ $? = 0 ] && rm $mail
      done
      echo ""
      ham=.
    }
    ls $u/Spam/train/spam/[0-9]*. &> /dev/null && {
      echo -n "spam: "
      for mail in $u/Spam/train/spam/[0-9]*.; do
        echo -n "`basename $mail`"
        sed '/^X-DSPAM-/d' $mail | \
          dspam --user users --class=spam --deliver=spam --source=error
        [ $? = 0 ] && rm $mail
      done
      echo ""
      spam=.
    }
    # tell cyrus that we removed some mail messages...
    [ $ham  ] && su - cyrus -c "reconstruct -r user/$user/Spam/train/ham"
    [ $spam ] && su - cyrus -c "reconstruct -r user/$user/Spam/train/spam"
  }
done
exit 0

This all works beautifully now. After a few days only,
just a few hundred mails, on a low volume site, we get:

# dspam_stats -H
users:
                TP True Positives:            136
                TN True Negatives:            392
                FP False Positives:             5
                FN False Negatives:            33
                SC Spam Corpusfed:              0
                NC Nonspam Corpusfed:           0
                TL Training Left:            2103
                SHR Spam Hit Rate          80.47%
                HSR Ham Strike Rate:        1.26%
                OCA Overall Accuracy:      93.29%

...were the Overall Accuracy is climbing rapidly.

Kudos to Tony who helped me to get thus far.

If of any use, our dspam is packaged as an rpm which
works right-out-of-the-box on a SuSE Linux 10.1 platform:
<http://www.linadd.org/download/mail/dspam-3.6.8-1.i586.rpm>.

Hope this helps
/Lars

Reply via email to