How about indexing a field with your application-centric id? This is
_the_ way this sort of thing is handled. You could then query for a
specific id using a TermQuery.
Erik
On Oct 11, 2005, at 11:58 AM, Shane O'Sullivan wrote:
Hi all,
As far as I understand today, Lucene assigns docIDs to documents
according
to the order in which the documents are added to the index. Hence,
docIDs
are assigned by the engine in a sequential manner, without gaps.
This order
of document identifiers then determines the order of the postings
in the
postings lists, i.e. all postings lists are sorted by docID. It
also means
that the same document appearing in two different indices would
probably not
have the same docID (unless some extreme care was taken to insert
documents
in the same order).
There are situations where the application wants to determine the
docID for
the index, i.e. to control the ordering of occurrences in the postings
lists. This is useful to ensure, for example, that a document has a
stable
and consistent document identifier regardless of insertion order to an
index.
In either case, the application would want to pass into the index the
numeric identifier of the document. However, such identifiers may
not be
sequential, i.e. it's possible that there would be a document with
docID M
without there being any document whose docID is M-1.
Q1. How difficult would it be to change Lucene to accept the docIDs
from the
application, and not care about any possible gaps those ids may have?
One possible problem is that since the Doc Ids could become very
large, and
are non-sequential, creating a single array for them all would not be
feasible.
Q2. Does Lucene's search code depend on the fact that document IDs are
sequential?
Thanks
Shane
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]