At 2:37 PM +0100 9/3/99, David Adams wrote:
>1) Documentation
>
>The ht://dig documentation is excellent, but could I suggest the
>following text to replace the "description" of valid_punctuation in the
>online documentation:
Suggestions for documentation are *always* welcome.
>prefix_match_character and explicitly placing in it valid_punctuation
>stops a "prefix" search from working.
This is already fixed in the 3.2 development code. We changed the way
those characters were stripped from the query, in part because we
added a regex fuzzy algorithm.
>The number of such words is relatively few: out of over 2 million
>entries in the wordlist file only 127 contain '(' and less than five
>hundred contain ',' or '.'. All the entries for these words have a w:
>(weighting ?) of 49950 or larger. I've searched for a few of these
>words and all occur in either META keywords or META contents, which are
>scored highly. Could there be a bug specific to the processing of the
>text between <HEAD> and </HEAD>?
I think you're probably right. I don't think there's anything that's
stripping out valid_punctuation for that code. Grr. Thanks for the
heads up.
>is indexed as "'hello" and "there'". Is there a way around this?
Hmm. This is a bit of a problem. On the one hand, your example looks
wrong. However, let's say you were indexing some mailing list
archives for the GCC developers list. You want to index words like
'__builtin' and '#include' and company.
So we're stuck! The extra_word_chars attribute was added exactly for
that purpose--to index the GCC mailing lists. So there's nothing
stopping words from having these characters at the front.
Any suggestions?
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.