I know searching for phrases is going to be supported in 3.2, but I was
wondering if there were any plans for rating hits based on word
distancing.  Example:

user searches on "white AND russian"  and you have two documents which
contain those two words in the following phrases:

Doc1- Would you like a white russian?  (dist = 1)

Doc2- Do you see a russian over there on the white hill?  (dist = 5)

In the above examples, Doc1 would have a higher rating.

I've been toying with this idea in a SQL database.  It doesn't take long
before you relalize to keep track of words up to a distance of 10 takes
about 20x the diskspace of the original document, but I can live with that
(disks are cheap). The only problem is getting htdig to query my DB to get
the distance rating.  So what I thought would be really great is a generic
external rating interface where htsearch calls a program (defined in the
config file) and it would pass to it the URL of a hit that htsearch finds
and the search words ("white AND russian" in the above case).

Yes, I know I could parse the results list that htsearch natively returns,
but for many results, it can't deal with the case where the results span
multiple pages.

--
Aaron Turner, Core Developer       http://vodka.linuxkb.org/~aturner/
Linux Knowledge Base Organization  http://linuxkb.org/
Because world domination requires quality open documentation.
aka: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You'll receive a message confirming the unsubscription.

Reply via email to