Re: [dspam-users] chained

Jeffrey Taylor Mon, 02 Jun 2008 06:04:24 -0700

Quoting Qmail List <[EMAIL PROTECTED]>:
> Hi,
> 
> Had googled and found some threads about No such feature 'chained'. Changed
> my config file to Tokenizer chained/ Tokenizer chain. But the error does not
> go away.
> 
> @40000000484390a4150f0f54 3358: [06/02/2008 14:18:02] No such feature
> 'chained'


>From /etc/dspam.conf and "man dspam":

# Tokenizer: Specify the tokenizer to use. The tokenizer is the piece
# responsible for parsing the message into individual tokens. Depending on
# how many resources you are willing to trade off vs. accuracy, you may
# choose to use a less or more detailed tokenizer:
#   word    uniGram (single word) tokenizer
#           Tokenizes message into single individual words/tokens
#           example: "free" and "viagra"
#   chain   biGram (chained tokens) tokenizer (default)
#           Single words + chains adjacent tokens together
#           example: "free" and "viagra" and "free viagra"
#   sbph    Sparse Binary Polynomial Hashing tokenizer
#           Creates sparse token patterns across sliding window of 5-tokens
#           example: "the quick * fox jumped" and "the * * fox jumped"
#   osb     Orthogonal Sparse biGram
#           Similar to SBPH, but only uses the biGrams
#           example: "the * * fox" and "the * * * jumped"
#
Tokenizer chain



       --feature=[chained,noise,tb=N,whitelist]
              Specifies the features that should be activated for this filter 
instance.  The following features may be  used  individually
              or combined using a comma as a delimiter:

              chained  : Chained Tokens (also known as biGrams).  Chained 
Tokens combines adjacent tokens, presently with a window size of
              2, to form token "chains".  Chained tokens uses additional 
storage resources, but greatly improves accuracy.  Recommended as
              a default feature.

              noise  :   Bayesian  Noise  Reduction  (BNR).   Bayesian  Noise 
Reduction kicks in at 2500 innocent messages and provides an
              advanced progressive noise logic to reduce Bayesian Noise 
(wordlist attacks) in spams.   See  http://bnr.nuclearelephant.com
              for more information.

              tb=N  :  Sets the training loop buffering level.  Training loop 
buffering is the amount of statistical sedation performed to
              water down statistics and avoid false positives during the user's 
training loop.  The training buffer sets the buffer sensi‐
              tivity,  and should be a number between 0 (no buffering 
whatsoever) to 10 (heavy buffering).  The default is 5, half of what
              previous versions of DSPAM used.  To avoid dulling down 
statistics at all during the training loop, set this to 0.

              whitelist :  Automatic whitelisting.  DSPAM will keep track of 
the entire "From:" line for each message received  per  user,
              and automatically whitelist messages from senders with more than 
20 innocent messages and zero spams.  Once the user reports
              a spam from the sender, automatic whitelisting will automatically 
be deactivated for that  sender.   Since  DSPAM  uses  the
              entire  "From:"  line,  and not just the sender's email address, 
automatic whitelisting is a very safe approach to improving
              accuracy especially during initial training.

              sbph :  Sparse Binary Polynomial Hashing. Bill Yerazunis' 
tokenizer method from CRM114. Tokenizer method only -  works  with
              existing combination algorithms.

!DSPAM:1011,4843efc2150921786919164!

Re: [dspam-users] chained

Reply via email to