Michael,

Couldn't we add deleteByQuery to IndexWriter without adding the UID field?

Would that be "enough" to make IndexReader read-only (ie, do we still really need to delete by docID from IndexWriter?).

If we still need that ... maybe we could extend IndexWriter so that you can hold a lock on docIDs changing while you do your stuff, eg:

  writer.freezeDocIDs();
  try {
    get docIDs from somewhere & call writer.deleteByDocID
  } finally {
    writer.unfreezeDocIDs();
  }

If we went that route, we'd need to expose methods in IndexWriter to let you get reader(s), and, to then delete by docID.

I'm not certain this will work :) I'm just throwing alternative ideas out...

I do like the idea of a UID field, but I'm a bit nervous about having the "core" maintain it and then have things in the core that depend on its presence. At first it might be optional, but I could see us over time making more and more functionality that require UID to be present, to the point where it's eventually not really optional...

Mike

Michael Busch wrote:

Paul Elschot wrote:
Michael,

How would IndexWriter.addIndexes() work with unique doc ids?

Hi Paul,

it would probably be a limitation of this design. The only way I can
think of right now to ensure that during an addIndexes() the UIDs don't change is an API in IndexWriter like setMinUID(long). When you create an
index and you know that you'll add it to another one via addIndexes(),
then you could use this method to set the min UID value in that index to the max number of add/update operations you'd expect in the other index.

Please note that the UIDs that I'm thinking about here would actually
not affect the index order. All postings would still be stored in
(dynamic) doc id order.
This means, with this design the search results would not be returned in
UID order, so the UIDs couldn't be used efficiently e. g. for a join
operation with an external data structure (e. g. database). I think in
this regard my proposed UID design differs from what was discussed here
some time ago.

The main usecase here is to get rid of readers that do write operations. I think that this would be very desireable when we implement updateable column-fields. Then you could use the UIDs that an IndexReader returned
to delete or update docs or the column fields/norms, and you wouldn't
have to worry about IndexReaders being "in sync" with the IndexWriters.

Maybe this UID design that I'm thinking out loudly here is total
overkill for the mentioned use cases. I'm open and interested in other
alternative ideas!
-Michael



Regards,
Paul Elschot



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to