I am using htdig 3.1.2, and my config file includes:

extra_word_characters:  _
valid_punctuation:      !@#$%^&*()-+|~=`{}[]:";'<>?,./

I find that the word database build by htdig includes many words that
contain or end in a comma or other punctuation. For example:

arts,   i:2514  l:1     w:49950
assessed,       i:2523  l:1     w:49950
atmospheric,    i:2529  l:1     w:49950
b.sc,   i:120   l:1     w:49950
b.sc,   i:16406 l:1     w:49950
b.sc,   i:16409 l:1     w:49950
b.sc,   i:3039  l:1     w:49950
b.sc,   i:3040  l:1     w:49950
b.sc,   i:3041  l:1     w:49950
ba,     i:17    l:1     w:49950

Am I misunderstanding the documentation on "valid_punctuation"?

I can't figure out how the configuration file attributes 

        extra_word_characters 
and
        valid_punctuation 

work together.  What happens when the same character is in both?

Why doesn't the documented list of default characters for
valid_punctuation include the question mark (?) and the doublequote (")?

What separates words, is it whitespace only?

Thanks

-- 
 
David Adams
Computing Services
University of Southampton

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.

Reply via email to