Michael,
Couldn't we add deleteByQuery to IndexWriter without adding the UID
field?
Would that be "enough" to make IndexReader read-only (ie, do we still
really need to delete by docID from IndexWriter?).
If we still need that ... maybe we could extend IndexWriter so that
you can hold a lock on docIDs changing while you do your stuff, eg:
writer.freezeDocIDs();
try {
get docIDs from somewhere & call writer.deleteByDocID
} finally {
writer.unfreezeDocIDs();
}
If we went that route, we'd need to expose methods in IndexWriter to
let you get reader(s), and, to then delete by docID.
I'm not certain this will work :) I'm just throwing alternative
ideas out...
I do like the idea of a UID field, but I'm a bit nervous about having
the "core" maintain it and then have things in the core that depend
on its presence. At first it might be optional, but I could see us
over time making more and more functionality that require UID to be
present, to the point where it's eventually not really optional...
Mike
Michael Busch wrote:
Paul Elschot wrote:
Michael,
How would IndexWriter.addIndexes() work with unique doc ids?
Hi Paul,
it would probably be a limitation of this design. The only way I can
think of right now to ensure that during an addIndexes() the UIDs
don't
change is an API in IndexWriter like setMinUID(long). When you
create an
index and you know that you'll add it to another one via addIndexes(),
then you could use this method to set the min UID value in that
index to
the max number of add/update operations you'd expect in the other
index.
Please note that the UIDs that I'm thinking about here would actually
not affect the index order. All postings would still be stored in
(dynamic) doc id order.
This means, with this design the search results would not be
returned in
UID order, so the UIDs couldn't be used efficiently e. g. for a join
operation with an external data structure (e. g. database). I think in
this regard my proposed UID design differs from what was discussed
here
some time ago.
The main usecase here is to get rid of readers that do write
operations.
I think that this would be very desireable when we implement
updateable
column-fields. Then you could use the UIDs that an IndexReader
returned
to delete or update docs or the column fields/norms, and you wouldn't
have to worry about IndexReaders being "in sync" with the
IndexWriters.
Maybe this UID design that I'm thinking out loudly here is total
overkill for the mentioned use cases. I'm open and interested in other
alternative ideas!
-Michael
Regards,
Paul Elschot
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]