Has anyone else had any problems when searching for hyphenated names or
words? I was searching for Hewlett-Packard and, even though there were
thousands of matches with the name in the first few bytes of the saved
excerpt none of the excerpts were displayed. I have a hyphen in the
valid_punctuation string in the config file so maybe the routine which
highligths the search words in the excerpt is trying to compare
HewlettPackard with Hewlett-Packard and not getting any matches?

I tried removing the hyphen from valid_punctuation but then it started
searching for Hewlett and Packard which was not quite what I wanted. I
tried a few other terms and the results were similar for hi-fi, x-ray etc.

Also does anyone know if there is a maximum word length that htdig will
store? If there is (and on my system it seems to be set to 12) how would I
change it? I am using htdig 3.1.0b2 on a Solaris 2.6 Sparc box.

I created an HTML file containing only the name Hewlett-Packard and indexed
it.

With a hyphen "-" in the valid_punctuation string in the configuration file
my db.wordlist contains:
-0
test    c:1     l:241   i:1     w:75900 a:0
hewlettpacka    c:1     l:632   i:1     w:368   a:0

I looked at a few other instances of db.wordlist on my system and found
lots of odd run-on words like this but none were longer than 12 characters
(some were shorter).

Without the hyphen the db.wordlist looks like this:
-0
test    c:1     l:241   i:1     w:75900 a:0
hewlett c:1     l:632   i:1     w:368   a:0
packard c:1     l:724   i:1     w:276   a:0

I am not sure if this is better but this weekend I am going to try
re-indexing without any valid_punctuation and see what happens.

Has anyone else solved the hyphenation problem?

Thanks in advance,

Paul Lucas
Frost & Sullivan


----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.

Reply via email to