Re: [dspam-users] Train dspam

Steve Wed, 06 Aug 2008 12:44:24 -0700

-------- Original-Nachricht --------
> Datum: Wed, 6 Aug 2008 09:48:41 +0300
> Von: s91066 <[EMAIL PROTECTED]>
> An: [email protected]
> Betreff: Re: [dspam-users] Train dspam


> On Wednesday 06 August 2008 00:28:34 Steve wrote:
> > -------- Original-Nachricht --------
> >
> > > Datum: Tue, 5 Aug 2008 11:14:44 +0300
> > > Von: s91066 <[EMAIL PROTECTED]>
> > > An: [email protected]
> > > CC: [EMAIL PROTECTED]
> > > Betreff: [dspam-users] Train dspam
> > >
> > > I try to understand how dspam is trained. My setup is IMAP + Maildir,
> > > thus, I
> > > have created a Junk and a NoSpam directory at which users add the
> spams
> > > and
> > > the false positive mails respectively.
> > > I run a script every hour in order to collect data. The script trains
> > > dspam
> > > as:
> > > dspam --user $USER --class=spam --source=error < $j
> > > where $USER is the username (not the mail address, but the username)
> and
> > > $j is
> > > the file that is spam but is classified as Innocent.
> > >
> > > Now, what I cannot understand is this:
> > > I have a lot of emails with the same subject and almost identical
> body. I
> > > had
> > > trained dspam to handle those emails as errors. However, I still
> receive
> > > those emails!
> > > Since I do have the emails, I run dspam from command line in order to
> see
> > > the
> > > classification result as:
> > > dspam --mode=notrain --user username --classify --stdout<mail_file
> > > The result was:
> > > X-DSPAM-Result: username; result="Innocent"; class="Innocent";
> > > probability=0.0000; confidence=1.00; signature=489787f2131472612618147
> > >
> > > So, why? The message was feed to dspam just a couple of minutes ago,
> with
> > > the
> > > same command as above (source=error).
> >
> > It is very easy. You called DSPAM with "--mode=notrain". Right? This
> means
> > that the command will NOT train DSPAM. Right? You call it with
> "--classify"
> > and with "--stdout". This means that the command will CLASSIFY the
> message
> > and print out the output (the whole output) to the screen. Right? Do you
> > see the result having a signature? Now ask yourself how the signature
> got
> > there even when you told DSPAM to NOT TRAIN and you told DSPAM to
> CLASSIFY.
> > And what does DSPAM do? It prints out a signature. But a signature is
> only
> > created when the message get's tagged and tagging should not happen with
> > "--mode=notrain --classify". Hmmm... well... very easy: The message you
> > feed to DSPAM ALREADY HAS A SIGNATURE. That is the problem. Could you
> try
> > this and tell me what the outcome is:
> >
> > sed
> >
> "/^\(X\-Quarantine\-ID:\|X\-OSBF\-Lua\-Score:\|X\-CRM114\-[a-zA-Z]*:\|X\-\(
> >DKIM\|SenderID\):\|X\-Virus\-Scanned:\|X\-Greylist:\|X\-DCC\-.*\-Metrics:\|X
> >\-\(Virus\|Pyzor\|Razor\)\-Status:\|X\-Delivery\-Agent:\|Received\-SPF:\|X\-
> >policyd\-weight:\|X\-Spam\-[^:]*:\)
> >
> .*$/d;/^X\-Amavis\-OS\-Fingerprint:/,+1d;/^X\-DSPAM\-Result\:/,/^X\-DSPAM\-
> >Signature: [0-9a-f,]*$/d;s/^Subject: \(\(ADV\|UNS\):[\t
> > ]\{1,99\}\)\{0,1\}\(\[[+-]\{1,2\}\][\t ]\{1,99\}\)\{0,1\}\(\[SPAM\][\t
> > ]\{1,99\}\)\{0,1\}/Subject:
> > /;s/\0-9]\{0,9\},\{0,1\}[0-9a-f]\{1,32\}\!//g" mail_file | dspam
> > --user username --classify --stdout --mode=notrain --deliver=summary
> >
> >
> > Do you still get DSPAM reporting that message as Innocent? Probably not.
> > Right?
> >
> > > Shouldn't dspam report the file as spam?
> >
> > No. See above.
> >
> > > Thank you
> > > Peter
> >
> > Steve
> >
> > > 
> 
> OK, let me be more 'verbose':
> mail_file: A mail file that is already scanned by DSPAM. User has moved
> the 
> file to .Junk IMAP folder. So, the mail file, naturally, has a signature. 
> Thus, the mail should either be classified as SPAM or not. 
> The output of the sed command is:
>       As user: [EMAIL PROTECTED]:
> X-DSPAM-Result: [EMAIL PROTECTED]; result="Innocent";
> class="Innocent"; 
> probability=0.0000; confidence=1.00; signature=N/A
>       As user [EMAIL PROTECTED]:
> X-DSPAM-Result: [EMAIL PROTECTED]; result="Innocent"; class="Innocent"; 
> probability=0.0000; confidence=1.00; signature=N/A
>       As unix_username:
> X-DSPAM-Result: gina; result="Innocent"; class="Innocent";
> probability=0.0000; 
> confidence=0.99; signature=N/A
> 
> 
> So, in any case, dspam handles a spam message (which was fed to dspam with
> the 
> ** dspam --user $USER --class=spam --source=error < $j) as innocent. 
> 
> So, I do not believe that dspam is trained. More over, for the majority of
> users, either your sed command, or my method report innocent the mails. 
> I have in my collective junk folder over 270 emails, just from today! All
> of 
> them are used by the command.  but dspam, reports for the same 
> files "Innocent"! Any ideas?
>
Please post:
- the content of mail_file
- the content of your dspam.conf
- the output of:
   dspam --version
- the output of:
   cat "$(dspam --version 2>&1|sed -n 
"s:.*\-\-with\-dspam\-home=\([^'\"]*\).*:\1:p")"/group
- the output of:
   cat "$(dspam --version 2>&1|sed -n 
"s:.*\-\-sysconfdir=\([^'\"]*\).*:\1:p")"/group


> Peter
> 
Steve

-- 
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten 
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

!DSPAM:1011,4899ff07150924167120807!

Re: [dspam-users] Train dspam

Reply via email to