Re: Unique doc ids

Michael Busch Wed, 23 Jan 2008 01:07:25 -0800

Terry Yang wrote:
> Hi,Michael,
> You idea is good! But i have a question and thanks for your help!
>


Hi Terry,

> Can u explain more about how you store a reverse UID-->ID?  How u guarantee
> UID
> can be mapped to the correct dynamic ID. I mean if a docid =5 and then for
> some reason changed to 60, but you still stored UID-->5 in a file/memory?
> 
> 

Good question!

You can think of a UID as a special, unique term that every document
has. Let's say we have the following segment:

S1:
UID -> ID
  0 ->  0
  1 ->  1
  2 ->  2

Now we flush the segment, add two docs, update the document with UID=2,
add another doc, and then we'll have these two segments:

S1:
UID -> ID
  0 ->  0
  1 ->  1 (deleted)
  2 ->  2

S12
UID -> ID
  1 ->  2
  3 ->  0
  4 ->  1
  5 ->  3

You can view the UIDs as terms with a posting list, each list containing
just one posting. Now we want to find the ID for UID=1: in the example
we have two segments with the same UID=1. However, we know that the doc
in S1 with ID=1 is deleted, so we keep looking in the other segment(s)
for the UID until we find one whose corresponding ID is not deleted.
There can only be one valid entry at any time for one UID.

Of course we shouldn't really use a term + postinglist for the UIDs,
because this would be quite inefficient with the data structures we
currently have. We wouldn't want to store the UIDs as Strings and we
wouldn't need to store e. g. freq or positions. Also we might be able to
implement some heuristics to optimize the order in which we iterate the
segments for the UID lookup.

I believe this should work?

-Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Unique doc ids

Reply via email to