Re: Problem with near realtime search

Harald Kirsch Fri, 03 Aug 2012 22:38:52 -0700

Hello Simon,

thanks for the information. I really thought that once a docId isassigned it is kept until the document is deleted. The only problem Iwould have expected are docIds that no longer refer to a document,because it was deleted in the meantime. But this is clearly not the casein my setup.

But if docIds change during index rearrangement, then this would ofcourse completely explain the symptoms I saw.


So docIds can definitively change under the hood?

Harald.


Am 03.08.2012 17:24, schrieb Simon Willnauer:

hey harald,

if you use a possibly different searcher (reader) than you used for
the search you will run into problems with the doc IDs since they
might change during the request. I suggest you to use SearcherManager
or NRTMangager and carry on the searcher reference when you collect
the stored values. Just keep around the searcher you used and
NRTManager / SearcherManager will do the job for you.

simon

On Fri, Aug 3, 2012 at 3:41 PM, Harald Kirsch <[email protected]> wrote:

I am trying to (mis)use Lucene a bit like a NoSQL database or, rather, a
persistent map. I am entering 38000 documents at a rate of 1000/s to the
index. Because each item add may be actually an update, I have a sequence of
read/change/write for each of the documents.

All goes well until when just after writing the last item, I run a query
that retrieves about 16000 documents. All docids are collected in a
Collector, and, yes, I make sure to rebase the docIds. Then I iterate over
all docIds found and retrieve the documents basically like this:

   for(int docId : docIds) {
     Document d = getSearcher().doc(docId);
     ..
   }

where getSearcher() uses IndexReader.openIfChanged() to always get the most
current searcher and makes sure to eventually close the old searcher.


At document 15940 I get an exception like this:

Exception in thread "main" java.lang.IllegalArgumentException: docID must be

= 0 and < maxDoc=1 (got docID=1)

         at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:490)
         at
org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:568)
         at
org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:264)

I can get rid of the Exception by one of two ways that I both don't like:

1) Put a Thread.sleep(1000) just before running the query+document retrieval
part.

2) Use the same IndexSearcher to retrieve all documents instead of calling
getSearcher for each document retrieval.

This is just a test single threaded test program. I only see Lucene Merge
threads in jvisualvm besides the main thread. A breakpoint on the exception
shows that org.apache.lucene.index.DirectoryReader.document does seem to
have wrong segments, which triggers the Exception.

Since Lucene 3.6.1 is in productive use for some time I doubt it is a bug in
Lucene, but I don't see what I am doing wrong. It might be connected to
trying to get the freshest IndexReader for retrieving documents.

Any better ideas or explanations?

Harald.

--
Harald Kirsch


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


--
Harald Kirsch
Raytion GmbH
Kaiser-Friedrich-Ring 74
40547 Duesseldorf
Fon +49-211-550266-0
Fax +49-211-550266-19
http://www.raytion.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Problem with near realtime search

Reply via email to