> Date: Sun, 21 Feb 1999 16:45:48 -0500 > From: Geoff Hutchison <[EMAIL PROTECTED]> > The idea of plan 3 is that you don't store the location of the word. > Instead, you store which words are before and after it. Since phrases will > occur multiple times, this should provide some builtin space savings, since > you could simply store one record. > > Make sense? Not totally, or I'm confused. With plan 2, you store every word in a document, either uniquely with a list of locations (not as you put it) or as separate records per location (as you put it). With plan 3 (as I understand it), you similarly store a record for each unique word in a document, and list the WordID:s of "before" (and "after"). The speculation is that the before-and-after lists would be smaller enough compared to the locations list to make up for e.g. losing the "near" functionality. I don't really know, but offhand "don't think so". Maybe someone has studied this somewhere? It seems to me that both plans can take up roughly the same space (a list of locations or before/after WordID:s), while plan 2 should be preferred as being less constrained and directly giving more functionality than plan 3. brgds, H-P ------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the SUBJECT of the message.
