-------- Original-Nachricht -------- > Datum: Thu, 19 Nov 2009 10:32:34 +0100 > Von: coma <[email protected]> > An: [email protected] > Betreff: [Dspam-user] Dspam Headers
> Hi, > Hallo Coma, > I have a question on the X-DSPAM-Confidence and X-DSPAM-Probability > calculation, > > I have searched on the net, in archives (2004-2009) and source code (but > It's difficult for me) but i have not found a clear answer that allows me > to > understand. > > I think graham & burton algorithms calculates the probability and > confidence > that the mail is a spam or notspam with the frequency of occurrence of > each > token corresponding to the words of the mail, it's good? > +/- Yes. It's not the words that count but the tokens that count. The tokenizer is responsible what gets considered as token. Example: WORD: token -> uniGram (single word) CHAIN: token -> biGram (chained tokens) SBPH: token -> Sparse Binary Polynomial Hashing OSB: token -> Orthogonal Sparse biGram You can read here a more detailed description of the tokenizers used in DSPAM -> http://sourceforge.net/apps/mediawiki/dspam/index.php?title=Tokenizers > But I don't really understand how this calculation is made. > > I would like to know at whitch moment a mail is considered as a spam, I > think > 0.5 no? > Normally: Yes. After > 0.5. A value of 0.5 indicates that a token is neutral. Neither Spam nor Ham. > For X-dspam-Factors, for example: X-dspam-Factors: 15, and + everything, > 0.99000, + call us, 0.99000, Judicial, 0.99000, + per day, 0.99000, + and > lose, 0.99000, cost + to, 0.99000, ........... > > I think it's the probability for each word corresponding to a token, but > what the 15? > 15 is the amount of tokens considered. Graham takes the most significant 15 tokens and uses them for the computation. > Thank you in advance if you can help me once again, and sorry again for my > strange English. > I feel guilty! I do! I was the one writing about your English and I feel guilty! Don't apologize for your English. Don't do that. Just write how ever you think is right. I will ask you if I don't understand your question/comment. Others will probably do the same. So please don't apologize any more for your language. I guess I would be terrible in your native language and you sure would still try to help me if I would write in your native language (what is that anyway?). We are here on the mailing list a bunch of people from all over the world. It's not always easy but we manage it :) > coma > Steve -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Dspam-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspam-user
