-------- Original-Nachricht --------
> Datum: Tue, 5 Aug 2008 11:14:44 +0300
> Von: s91066 <[EMAIL PROTECTED]>
> An: [email protected]
> CC: [EMAIL PROTECTED]
> Betreff: [dspam-users] Train dspam
> I try to understand how dspam is trained. My setup is IMAP + Maildir,
> thus, I
> have created a Junk and a NoSpam directory at which users add the spams
> and
> the false positive mails respectively.
> I run a script every hour in order to collect data. The script trains
> dspam
> as:
> dspam --user $USER --class=spam --source=error < $j
> where $USER is the username (not the mail address, but the username) and
> $j is
> the file that is spam but is classified as Innocent.
>
> Now, what I cannot understand is this:
> I have a lot of emails with the same subject and almost identical body. I
> had
> trained dspam to handle those emails as errors. However, I still receive
> those emails!
> Since I do have the emails, I run dspam from command line in order to see
> the
> classification result as:
> dspam --mode=notrain --user username --classify --stdout<mail_file
> The result was:
> X-DSPAM-Result: username; result="Innocent"; class="Innocent";
> probability=0.0000; confidence=1.00; signature=489787f2131472612618147
>
> So, why? The message was feed to dspam just a couple of minutes ago, with
> the
> same command as above (source=error).
>
It is very easy. You called DSPAM with "--mode=notrain". Right? This means that
the command will NOT train DSPAM. Right? You call it with "--classify" and with
"--stdout". This means that the command will CLASSIFY the message and print out
the output (the whole output) to the screen. Right? Do you see the result
having a signature? Now ask yourself how the signature got there even when you
told DSPAM to NOT TRAIN and you told DSPAM to CLASSIFY. And what does DSPAM do?
It prints out a signature. But a signature is only created when the message
get's tagged and tagging should not happen with "--mode=notrain --classify".
Hmmm... well... very easy: The message you feed to DSPAM ALREADY HAS A
SIGNATURE. That is the problem. Could you try this and tell me what the outcome
is:
sed
"/^\(X\-Quarantine\-ID:\|X\-OSBF\-Lua\-Score:\|X\-CRM114\-[a-zA-Z]*:\|X\-\(DKIM\|SenderID\):\|X\-Virus\-Scanned:\|X\-Greylist:\|X\-DCC\-.*\-Metrics:\|X\-\(Virus\|Pyzor\|Razor\)\-Status:\|X\-Delivery\-Agent:\|Received\-SPF:\|X\-policyd\-weight:\|X\-Spam\-[^:]*:\)
.*$/d;/^X\-Amavis\-OS\-Fingerprint:/,+1d;/^X\-DSPAM\-Result\:/,/^X\-DSPAM\-Signature:
[0-9a-f,]*$/d;s/^Subject: \(\(ADV\|UNS\):[\t
]\{1,99\}\)\{0,1\}\(\[[+-]\{1,2\}\][\t ]\{1,99\}\)\{0,1\}\(\[SPAM\][\t
]\{1,99\}\)\{0,1\}/Subject: /;s/\0-9]\{0,9\},\{0,1\}[0-9a-f]\{1,32\}\!//g"
mail_file | dspam --user username --classify --stdout --mode=notrain
--deliver=summary
Do you still get DSPAM reporting that message as Innocent? Probably not. Right?
> Shouldn't dspam report the file as spam?
>
No. See above.
> Thank you
> Peter
>
Steve
>
> !DSPAM:1011,4897fde0150921570549289!
>
--
Psssst! Schon das coole Video vom GMX MultiMessenger gesehen?
Der Eine für Alle: http://www.gmx.net/de/go/messenger03
!DSPAM:1011,4898c607150929377744048!