-------- Original-Nachricht -------- > Datum: Wed, 6 Aug 2008 09:48:41 +0300 > Von: s91066 <[EMAIL PROTECTED]> > An: [email protected] > Betreff: Re: [dspam-users] Train dspam
> On Wednesday 06 August 2008 00:28:34 Steve wrote: > > -------- Original-Nachricht -------- > > > > > Datum: Tue, 5 Aug 2008 11:14:44 +0300 > > > Von: s91066 <[EMAIL PROTECTED]> > > > An: [email protected] > > > CC: [EMAIL PROTECTED] > > > Betreff: [dspam-users] Train dspam > > > > > > I try to understand how dspam is trained. My setup is IMAP + Maildir, > > > thus, I > > > have created a Junk and a NoSpam directory at which users add the > spams > > > and > > > the false positive mails respectively. > > > I run a script every hour in order to collect data. The script trains > > > dspam > > > as: > > > dspam --user $USER --class=spam --source=error < $j > > > where $USER is the username (not the mail address, but the username) > and > > > $j is > > > the file that is spam but is classified as Innocent. > > > > > > Now, what I cannot understand is this: > > > I have a lot of emails with the same subject and almost identical > body. I > > > had > > > trained dspam to handle those emails as errors. However, I still > receive > > > those emails! > > > Since I do have the emails, I run dspam from command line in order to > see > > > the > > > classification result as: > > > dspam --mode=notrain --user username --classify --stdout<mail_file > > > The result was: > > > X-DSPAM-Result: username; result="Innocent"; class="Innocent"; > > > probability=0.0000; confidence=1.00; signature=489787f2131472612618147 > > > > > > So, why? The message was feed to dspam just a couple of minutes ago, > with > > > the > > > same command as above (source=error). > > > > It is very easy. You called DSPAM with "--mode=notrain". Right? This > means > > that the command will NOT train DSPAM. Right? You call it with > "--classify" > > and with "--stdout". This means that the command will CLASSIFY the > message > > and print out the output (the whole output) to the screen. Right? Do you > > see the result having a signature? Now ask yourself how the signature > got > > there even when you told DSPAM to NOT TRAIN and you told DSPAM to > CLASSIFY. > > And what does DSPAM do? It prints out a signature. But a signature is > only > > created when the message get's tagged and tagging should not happen with > > "--mode=notrain --classify". Hmmm... well... very easy: The message you > > feed to DSPAM ALREADY HAS A SIGNATURE. That is the problem. Could you > try > > this and tell me what the outcome is: > > > > sed > > > "/^\(X\-Quarantine\-ID:\|X\-OSBF\-Lua\-Score:\|X\-CRM114\-[a-zA-Z]*:\|X\-\( > >DKIM\|SenderID\):\|X\-Virus\-Scanned:\|X\-Greylist:\|X\-DCC\-.*\-Metrics:\|X > >\-\(Virus\|Pyzor\|Razor\)\-Status:\|X\-Delivery\-Agent:\|Received\-SPF:\|X\- > >policyd\-weight:\|X\-Spam\-[^:]*:\) > > > .*$/d;/^X\-Amavis\-OS\-Fingerprint:/,+1d;/^X\-DSPAM\-Result\:/,/^X\-DSPAM\- > >Signature: [0-9a-f,]*$/d;s/^Subject: \(\(ADV\|UNS\):[\t > > ]\{1,99\}\)\{0,1\}\(\[[+-]\{1,2\}\][\t ]\{1,99\}\)\{0,1\}\(\[SPAM\][\t > > ]\{1,99\}\)\{0,1\}/Subject: > > /;s/\0-9]\{0,9\},\{0,1\}[0-9a-f]\{1,32\}\!//g" mail_file | dspam > > --user username --classify --stdout --mode=notrain --deliver=summary > > > > > > Do you still get DSPAM reporting that message as Innocent? Probably not. > > Right? > > > > > Shouldn't dspam report the file as spam? > > > > No. See above. > > > > > Thank you > > > Peter > > > > Steve > > > > > > > OK, let me be more 'verbose': > mail_file: A mail file that is already scanned by DSPAM. User has moved > the > file to .Junk IMAP folder. So, the mail file, naturally, has a signature. > Thus, the mail should either be classified as SPAM or not. > The output of the sed command is: > As user: [EMAIL PROTECTED]: > X-DSPAM-Result: [EMAIL PROTECTED]; result="Innocent"; > class="Innocent"; > probability=0.0000; confidence=1.00; signature=N/A > As user [EMAIL PROTECTED]: > X-DSPAM-Result: [EMAIL PROTECTED]; result="Innocent"; class="Innocent"; > probability=0.0000; confidence=1.00; signature=N/A > As unix_username: > X-DSPAM-Result: gina; result="Innocent"; class="Innocent"; > probability=0.0000; > confidence=0.99; signature=N/A > > > So, in any case, dspam handles a spam message (which was fed to dspam with > the > ** dspam --user $USER --class=spam --source=error < $j) as innocent. > > So, I do not believe that dspam is trained. More over, for the majority of > users, either your sed command, or my method report innocent the mails. > I have in my collective junk folder over 270 emails, just from today! All > of > them are used by the command. but dspam, reports for the same > files "Innocent"! Any ideas? > Please post: - the content of mail_file - the content of your dspam.conf - the output of: dspam --version - the output of: cat "$(dspam --version 2>&1|sed -n "s:.*\-\-with\-dspam\-home=\([^'\"]*\).*:\1:p")"/group - the output of: cat "$(dspam --version 2>&1|sed -n "s:.*\-\-sysconfdir=\([^'\"]*\).*:\1:p")"/group > Peter > Steve -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser !DSPAM:1011,4899ff07150924167120807!
