Re: Unique doc ids

Michael McCandless Wed, 23 Jan 2008 03:34:45 -0800


Michael,

Couldn't we add deleteByQuery to IndexWriter without adding the UIDfield?

Would that be "enough" to make IndexReader read-only (ie, do we stillreally need to delete by docID from IndexWriter?).

If we still need that ... maybe we could extend IndexWriter so thatyou can hold a lock on docIDs changing while you do your stuff, eg:


  writer.freezeDocIDs();
  try {
    get docIDs from somewhere & call writer.deleteByDocID
  } finally {
    writer.unfreezeDocIDs();
  }

If we went that route, we'd need to expose methods in IndexWriter tolet you get reader(s), and, to then delete by docID.

I'm not certain this will work :) I'm just throwing alternativeideas out...

I do like the idea of a UID field, but I'm a bit nervous about havingthe "core" maintain it and then have things in the core that dependon its presence. At first it might be optional, but I could see usover time making more and more functionality that require UID to bepresent, to the point where it's eventually not really optional...


Mike

Michael Busch wrote:

Paul Elschot wrote:
Michael,

How would IndexWriter.addIndexes() work with unique doc ids?
Hi Paul,

it would probably be a limitation of this design. The only way I can
think of right now to ensure that during an addIndexes() the UIDsdon'tchange is an API in IndexWriter like setMinUID(long). When youcreate an
index and you know that you'll add it to another one via addIndexes(),
then you could use this method to set the min UID value in thatindex tothe max number of add/update operations you'd expect in the otherindex.
Please note that the UIDs that I'm thinking about here would actually
not affect the index order. All postings would still be stored in
(dynamic) doc id order.
This means, with this design the search results would not bereturned in
UID order, so the UIDs couldn't be used efficiently e. g. for a join
operation with an external data structure (e. g. database). I think in
this regard my proposed UID design differs from what was discussedhere
some time ago.
The main usecase here is to get rid of readers that do writeoperations.I think that this would be very desireable when we implementupdateablecolumn-fields. Then you could use the UIDs that an IndexReaderreturned
to delete or update docs or the column fields/norms, and you wouldn't
have to worry about IndexReaders being "in sync" with theIndexWriters.
Maybe this UID design that I'm thinking out loudly here is total
overkill for the mentioned use cases. I'm open and interested in other
alternative ideas!
-Michael
Regards,
Paul Elschot
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Unique doc ids

Reply via email to