According to Ivan C Chang:
> I have succeeded in running htdig on a site with more than 700,000 links.
> I have also successfully run htmerge to create the word database and
> document index database.
> 
> When I run htsearch for some keywords, it sometimes happens that the
> results returned contained links that don't actually contain the keywords,
> when this happens, the search results report the following statement:
> 
> (None of the search words were found in the top of this document.)
> 
> followed by the link which doesn't actually contain the keyword.
> 
> I don't understand why this happens.

Can you give some specific examples of searches that do this?  Are there
particular words that trigger this, or does it happen seemingly at random?
Is it consistent, or do words that used to work fine cause problems later,
after further database updates?

I've run into one case which is like you describe.  A search for "illness"
turns into (after the endings fuzzy algorithm is applied) a search for
"(illness or ill or ills)".  This ends up matching 3 documents on my site,
none of which contain the 3 words above, but all of which contain the word
"I'll", which is put into the word database with punctuation removed.
This is consistent and understandable, and not a big problem as far as
I'm concerned.  I hadn't thought of other similar problems, but maybe
you've stumbled onto one.  If there are enough cases like this that
it is a problem, it may be worth working out a solution.  Then again,
your problem may be unrelated to this.

Another possibility could be database corruption, or outdated entries in
the database.  If you can afford to rebuild your database from scratch,
it may be worth a shot, to see if this makes the problem disappear.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.

Reply via email to