On Thu, 25 Nov 1999, Geoff Hutchison wrote:
> At 10:57 AM -0800 11/25/99, Aaron Turner wrote:
> >I know searching for phrases is going to be supported in 3.2, but I was
> >wondering if there were any plans for rating hits based on word
> >distancing. Example:
>
> Plans? Yes, but if someone took charge of this it would be greatly
> appreciated. I'm sure it would greatly improve scoring.
Well I don't know C++, but we have one developer on our team who does and
is already hacking htsearch (we'll submit a patch when done) maybe I can
convince him to take on a new project after this one.
> >I've been toying with this idea in a SQL database. It doesn't take long
> >before you relalize to keep track of words up to a distance of 10 takes
>
> I have good news for you. We simply store the location of the words.
> And it doesn't require 20x the disk space. :-) In fact, another
> developer is working on very significant compression.
Hmmmm. I guess I'll have to look into what you're doing. I didn't give
much thought into doing it that way because I figured it would be too CPU
intensive to search that way. My solution after about 30 minutes of
thought is rather ugly IMHO but it would be very fast, even for large
sites. Basically create a table/document (the ugly part) then have a row
for each indexable word in the document, where the key is the word. Then
have 10 columns, each column holds the words 1 to 10 away. The following
sentance would have the following entries in the table:
No matter where you go, there you are.
matter - where - you - - there - you - are
where - matter you - - there - you - are
you - where - matter there - you - are
there - you - are where - matter -
you - are there - - you - where - matter -
are - you - there - - you - where - matter
--
Aaron Turner, Core Developer http://vodka.linuxkb.org/~aturner/
Linux Knowledge Base Organization http://linuxkb.org/
Because world domination requires quality open documentation.
aka: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You'll receive a message confirming the unsubscription.