Yes, docIDs are currently sequentially assigned, starting with 0.
BUT: on hitting an exception (say in your analyzer) it will usually
use up a docID (and then immediately mark it as deleted).
Also, this behavior isn't "promised" in the API, ie it could in theory
(though I think it unlikely) change in a future release of Lucene.
And remember when a merge completes (or, optimize), any deleted docs
will "collapse down" all docIDs after them.
Mike
Ivan Vasilev wrote:
Hi Lucene Guys,
I have a question that is simple but is important for me. I did not
found the answer in the javadoc so I am asking here.
When adding Document-s by the method IndexWriter.addDocument(doc)
does the documents obtain Lucene IDs in the order that they are
added to the IndexWriter? I mean will first added doc be with Lucene
ID 0, second added with Lucene ID 1, etc?
Bellow I describe why I am asking this.
We plan to split our index to two separate indexes that will be read
by ParallelReader class. This is so because the one of them will
contain field(s) that will be indexed and stored and it will be
frequently changed. So to have always correct data returned from the
ParallelReader when changing documents in the small index the Lucene
IDs of these docs have to remain the same.
To do this Karl Wettin suggests a solution described in *LUCENE-879 <https://issues.apache.org/jira/browse/LUCENE-879
>*. I do not like this solution because it is connected to changing
Lucene source code, and after each refactoring potentially I will
have problems. The solution is related to optimizing index so it
will not be reasonably faster than the one that I prefer. And it is:
1. Read the whole index and reconstruct the documents including
index data by using TermDocs and TermEnum classes;
2. Change the needed documents;
3. Index documents in new index that will replace the initial one.
I can even simplify this algorithm (and the speed) if all the fields
will be always stored - I can read just the stored data and based on
this to reconstruct the content of the docs and re index them in new.
But anyway everything in the my approaches will depend on this - are
LuceneIDs in the index ordered in the same way as docs are added to
the IndexWriter.
Thanks in Advance,
Ivan
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]