Just add another field to document that is your "external" document identifier, which is what the request is essentially asking for - another layer of indirection between identifiers and physical locations in the index.
-----Original Message----- From: Shane O'Sullivan [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 11, 2005 10:59 AM To: [email protected] Subject: Are Non-consecutive Document IDs feasible? Hi all, As far as I understand today, Lucene assigns docIDs to documents according to the order in which the documents are added to the index. Hence, docIDs are assigned by the engine in a sequential manner, without gaps. This order of document identifiers then determines the order of the postings in the postings lists, i.e. all postings lists are sorted by docID. It also means that the same document appearing in two different indices would probably not have the same docID (unless some extreme care was taken to insert documents in the same order). There are situations where the application wants to determine the docID for the index, i.e. to control the ordering of occurrences in the postings lists. This is useful to ensure, for example, that a document has a stable and consistent document identifier regardless of insertion order to an index. In either case, the application would want to pass into the index the numeric identifier of the document. However, such identifiers may not be sequential, i.e. it's possible that there would be a document with docID M without there being any document whose docID is M-1. Q1. How difficult would it be to change Lucene to accept the docIDs from the application, and not care about any possible gaps those ids may have? One possible problem is that since the Doc Ids could become very large, and are non-sequential, creating a single array for them all would not be feasible. Q2. Does Lucene's search code depend on the fact that document IDs are sequential? Thanks Shane --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
