-------- Original-Nachricht --------
> Datum: Thu, 19 Nov 2009 10:32:34 +0100
> Von: coma <[email protected]>
> An: [email protected]
> Betreff: [Dspam-user] Dspam Headers

> Hi,
> 
Hallo Coma,


> I have a question on the  X-DSPAM-Confidence and X-DSPAM-Probability
> calculation,
> 
> I have searched on the net, in archives (2004-2009) and source code (but
> It's difficult for me) but i have not found a clear answer that allows me
> to
> understand.
> 
> I think graham & burton algorithms calculates the probability and
> confidence
> that the mail is a spam or notspam with the frequency of occurrence of
> each
> token corresponding to the words of the mail, it's good?
> 
+/- Yes. It's not the words that count but the tokens that count. The tokenizer 
is responsible what gets considered as token. Example:
WORD: token -> uniGram (single word)
CHAIN: token -> biGram (chained tokens)
SBPH: token -> Sparse Binary Polynomial Hashing
OSB: token -> Orthogonal Sparse biGram

You can read here a more detailed description of the tokenizers used in DSPAM 
-> http://sourceforge.net/apps/mediawiki/dspam/index.php?title=Tokenizers


> But I don't really understand how this calculation is made.
> 
> I would like to know at whitch moment a mail is considered as a spam, I
> think > 0.5 no?
> 
Normally: Yes. After > 0.5.
A value of 0.5 indicates that a token is neutral. Neither Spam nor Ham.


> For X-dspam-Factors, for example: X-dspam-Factors: 15, and + everything,
> 0.99000, + call us, 0.99000, Judicial, 0.99000, + per day, 0.99000, + and
> lose, 0.99000, cost + to, 0.99000, ...........
> 
> I think it's the probability for each word corresponding to a token, but
> what the 15?
> 
15 is the amount of tokens considered. Graham takes the most significant 15 
tokens and uses them for the computation.


> Thank you in advance if you can help me once again, and sorry again for my
> strange English.
> 
I feel guilty! I do! I was the one writing about your English and I feel 
guilty! Don't apologize for your English. Don't do that. Just write how ever 
you think is right. I will ask you if I don't understand your question/comment. 
Others will probably do the same. So please don't apologize any more for your 
language. I guess I would be terrible in your native language and you sure 
would still try to help me if I would write in your native language (what is 
that anyway?). We are here on the mailing list a bunch of people from all over 
the world. It's not always easy but we manage it :)


> coma
>
Steve
-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to