On Wednesday 06 August 2008 00:28:34 Steve wrote:
> -------- Original-Nachricht --------
>
> > Datum: Tue, 5 Aug 2008 11:14:44 +0300
> > Von: s91066 <[EMAIL PROTECTED]>
> > An: [email protected]
> > CC: [EMAIL PROTECTED]
> > Betreff: [dspam-users] Train dspam
> >
> > I try to understand how dspam is trained. My setup is IMAP + Maildir,
> > thus, I
> > have created a Junk and a NoSpam directory at which users add the spams
> > and
> > the false positive mails respectively.
> > I run a script every hour in order to collect data. The script trains
> > dspam
> > as:
> > dspam --user $USER --class=spam --source=error < $j
> > where $USER is the username (not the mail address, but the username) and
> > $j is
> > the file that is spam but is classified as Innocent.
> >
> > Now, what I cannot understand is this:
> > I have a lot of emails with the same subject and almost identical body. I
> > had
> > trained dspam to handle those emails as errors. However, I still receive
> > those emails!
> > Since I do have the emails, I run dspam from command line in order to see
> > the
> > classification result as:
> > dspam --mode=notrain --user username --classify --stdout<mail_file
> > The result was:
> > X-DSPAM-Result: username; result="Innocent"; class="Innocent";
> > probability=0.0000; confidence=1.00; signature=489787f2131472612618147
> >
> > So, why? The message was feed to dspam just a couple of minutes ago, with
> > the
> > same command as above (source=error).
>
> It is very easy. You called DSPAM with "--mode=notrain". Right? This means
> that the command will NOT train DSPAM. Right? You call it with "--classify"
> and with "--stdout". This means that the command will CLASSIFY the message
> and print out the output (the whole output) to the screen. Right? Do you
> see the result having a signature? Now ask yourself how the signature got
> there even when you told DSPAM to NOT TRAIN and you told DSPAM to CLASSIFY.
> And what does DSPAM do? It prints out a signature. But a signature is only
> created when the message get's tagged and tagging should not happen with
> "--mode=notrain --classify". Hmmm... well... very easy: The message you
> feed to DSPAM ALREADY HAS A SIGNATURE. That is the problem. Could you try
> this and tell me what the outcome is:
>
> sed
> "/^\(X\-Quarantine\-ID:\|X\-OSBF\-Lua\-Score:\|X\-CRM114\-[a-zA-Z]*:\|X\-\(
>DKIM\|SenderID\):\|X\-Virus\-Scanned:\|X\-Greylist:\|X\-DCC\-.*\-Metrics:\|X
>\-\(Virus\|Pyzor\|Razor\)\-Status:\|X\-Delivery\-Agent:\|Received\-SPF:\|X\-
>policyd\-weight:\|X\-Spam\-[^:]*:\)
> .*$/d;/^X\-Amavis\-OS\-Fingerprint:/,+1d;/^X\-DSPAM\-Result\:/,/^X\-DSPAM\-
>Signature: [0-9a-f,]*$/d;s/^Subject: \(\(ADV\|UNS\):[\t
> ]\{1,99\}\)\{0,1\}\(\[[+-]\{1,2\}\][\t ]\{1,99\}\)\{0,1\}\(\[SPAM\][\t
> ]\{1,99\}\)\{0,1\}/Subject:
> /;s/\0-9]\{0,9\},\{0,1\}[0-9a-f]\{1,32\}\!//g" mail_file | dspam
> --user username --classify --stdout --mode=notrain --deliver=summary
>
>
> Do you still get DSPAM reporting that message as Innocent? Probably not.
> Right?
>
> > Shouldn't dspam report the file as spam?
>
> No. See above.
>
> > Thank you
> > Peter
>
> Steve
>
> > !DSPAM:1011,4897fde0150921570549289!

OK, let me be more 'verbose':
mail_file: A mail file that is already scanned by DSPAM. User has moved the 
file to .Junk IMAP folder. So, the mail file, naturally, has a signature. 
Thus, the mail should either be classified as SPAM or not. 
The output of the sed command is:
        As user: [EMAIL PROTECTED]:
X-DSPAM-Result: [EMAIL PROTECTED]; result="Innocent"; class="Innocent"; 
probability=0.0000; confidence=1.00; signature=N/A
        As user [EMAIL PROTECTED]:
X-DSPAM-Result: [EMAIL PROTECTED]; result="Innocent"; class="Innocent"; 
probability=0.0000; confidence=1.00; signature=N/A
        As unix_username:
X-DSPAM-Result: gina; result="Innocent"; class="Innocent"; probability=0.0000; 
confidence=0.99; signature=N/A


So, in any case, dspam handles a spam message (which was fed to dspam with the 
** dspam --user $USER --class=spam --source=error < $j) as innocent. 

So, I do not believe that dspam is trained. More over, for the majority of 
users, either your sed command, or my method report innocent the mails. 
I have in my collective junk folder over 270 emails, just from today! All of 
them are used by the command.  but dspam, reports for the same 
files "Innocent"! Any ideas?
Peter


!DSPAM:1011,4899490d150921624115544!


Reply via email to