When I update a document in Lucene (i.e., re-indexing), I have to delete
the existing document, and create a new one. My understanding is that this
assigns a new doc ID for the newly created document. If that is the case,
is it true that the system can rather quickly run out of doc ID space
(which is about 2 billion since doc ID data type is integer) if the update
frequency is extremly high in an application?

So, my question is -

1. Does Lucene always increment the doc ID for newly created document
(hence, the risk of running out of ID space) just like auto increment
column in the database does? Or does it re-use the numbers that are
currently not in use (i.e. those IDs that were once assigned but since
deleted)?

2. If Lucene can recycle old IDs, it would be even better if I could force
it to re-use a particular doc ID when updating a document by deleting old
one and creating new one. This scheme will allow me to reference this doc
ID from another doc in the index as if it was a foreign key value that
doesn't change upon reindexing. I didn't see anything like this in the API,
but is it ever possible?

3. If Lucene does not recycle old IDs, how do people deal with this issue
when designing a system with extremely high re-indexing frequency?

Thanks in advance for help
/Jong

Reply via email to