On Fri, May 11, 2012 at 7:56 AM, Jong Kim <jong.luc...@gmail.com> wrote: > When I update a document in Lucene (i.e., re-indexing), I have to delete > the existing document, and create a new one. My understanding is that this > assigns a new doc ID for the newly created document. If that is the case, > is it true that the system can rather quickly run out of doc ID space > (which is about 2 billion since doc ID data type is integer) if the update > frequency is extremly high in an application?
the Document IDs in Lucene are per segment. ie. they are always segment based. There is certainly a limitation here that is 1. in the API ie. all methods accepting internal doc ids expect int not long. 2. on a segment level. Basically you gonna run into problems if you have more than Integer.MAX_VALUE documents in one index. You can work around that if everything is "per-segment", in such a case the limitation only applies to a single segment. Running out of "ids" won't be an issue as they are all relative per-segment. ie. you can forever update a single document and don't run out of ids. > > So, my question is - > > 1. Does Lucene always increment the doc ID for newly created document > (hence, the risk of running out of ID space) just like auto increment > column in the database does? Or does it re-use the numbers that are > currently not in use (i.e. those IDs that were once assigned but since > deleted)? > > 2. If Lucene can recycle old IDs, it would be even better if I could force > it to re-use a particular doc ID when updating a document by deleting old > one and creating new one. This scheme will allow me to reference this doc > ID from another doc in the index as if it was a foreign key value that > doesn't change upon reindexing. I didn't see anything like this in the API, > but is it ever possible? > > 3. If Lucene does not recycle old IDs, how do people deal with this issue > when designing a system with extremely high re-indexing frequency? the lucene internal ids should not be used in the application integrating lucene or at least not in a way you would use a primary "auto-incremented" key in a DB. you can specify your own "id" field and reuse the ids (you actually have to if you want to update. does that make sense? simon > > Thanks in advance for help > /Jong --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org