I know searching for phrases is going to be supported in 3.2, but I was
wondering if there were any plans for rating hits based on word
distancing. Example:
user searches on "white AND russian" and you have two documents which
contain those two words in the following phrases:
Doc1- Would you like a white russian? (dist = 1)
Doc2- Do you see a russian over there on the white hill? (dist = 5)
In the above examples, Doc1 would have a higher rating.
I've been toying with this idea in a SQL database. It doesn't take long
before you relalize to keep track of words up to a distance of 10 takes
about 20x the diskspace of the original document, but I can live with that
(disks are cheap). The only problem is getting htdig to query my DB to get
the distance rating. So what I thought would be really great is a generic
external rating interface where htsearch calls a program (defined in the
config file) and it would pass to it the URL of a hit that htsearch finds
and the search words ("white AND russian" in the above case).
Yes, I know I could parse the results list that htsearch natively returns,
but for many results, it can't deal with the case where the results span
multiple pages.
--
Aaron Turner, Core Developer http://vodka.linuxkb.org/~aturner/
Linux Knowledge Base Organization http://linuxkb.org/
Because world domination requires quality open documentation.
aka: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You'll receive a message confirming the unsubscription.