On Tue, 9 Jun 1998, jmoore wrote:

> The main problem with this approach as outlined, is that the index will be
> at least 3 times the size of the collected documents since the previous
> and next word is stored for each word.  There are probably a lot of
> optimizations that can happen here - the first is to use 2 byte short ints

Hi Jason,

   It would seem to me that a more efficient approach would be to store
the offset of each word from some common reference point, say the beginning
of each document. That way, storage requirements would be O(n) i.e. on the
order of the number of words, and you can look up words in any combination 
(an added feature for htfuzzy?).

  -- Edmond

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.

Reply via email to