Re: Problems with reopening IndexReader while pushing documents to the index

Michael McCandless Tue, 01 Jul 2008 05:29:39 -0700


Aha!  OK now I see how that led to your exception.

When you create a MultiReader, passing in the array of IndexReaders,MultiReader simply holds onto your array. It also computes & cachesnorms() the first time its called, based on the total # docs it seesin all the readers in that array.

But then when you re-opened single readers in that array, without thencreating a new MultiReader, this makes the norms array "stale" andthus it's easily possible to encounter a docID that's out of bounds.

I think a good fix for this sort of trap would be for MultiReader tomake a private copy of the array that's passed in. I'll open an issue.


Mike

Sascha Fahl wrote:

Yes I am using IndexReader.reopen(). Here is my code doing this:
public void refreshIndeces() throws CorruptIndexException,IOException {if ((System.currentTimeMillis() - this.lastRefresh) >this.REFRESH_PERIOD) {
                        this.lastRefresh = System.currentTimeMillis();
                        boolean refreshFlag = false;
                        for (int i = 0; i < this.indeces.length; i++){
                                IndexReader newIR = this.indeces[i].reopen();
                                if (newIR != this.indeces[i]){
                                        this.indeces[i].close();
                                        refreshFlag = true;
                                }
                                this.indeces[i] = newIR;
                        }
                        if(refreshFlag){
                                this.multiReader = new 
MultiReader(this.indeces);
                                this.multiSearcher = new 
IndexSearcher(this.multiReader);
                        }
                }
        }
As you see I am using a MultiReader. With creating a new MultiReader+ new IndexSearcher the exception goes away. I tested it withupdating the index with 50000 Documents and sent 60000 requests andnothing bad happened.
Sascha


Am 01.07.2008 um 12:14 schrieb Michael McCandless:
That's interesting. So you are using IndexReader.reopen() to get anew reader? Are you closing the previous reader?
The exception goes away if you create a new IndexSearcher on thereopened IndexReader?
I don't yet see how that could explain the exception, though. Ifyou reopen() the underling IndexReader in an IndexSearcher, theoriginal IndexReader should still be intact and still searching thepoint-in-time snapshot that it had been opened on. IndexSearcheritself doens't hold any "state" about the index (I think); itrelies on IndexReader for that.
Mike

Sascha Fahl wrote:
I think I could solve the "problem". It was no Lucene specificproblem. What I did was reopen the IndexReader but not creating anew IndexSearcher object. But of course as Java always passesparameters by value (no matter what parameter) the oldIndexSearcher object did not see the updated IndexReader object,because IndexSearcher is working with its own instance ofIndexReader and not with the reference to the originalIndexReader. So what causedthe problem was the requests always were sent to the same instanceof IndexSearcher. But when the IndexSearcher had to access theindex physically (the harddisk) of course changes made by theIndexWriter were just visible to the IndexReader but not to theIndexSearcher.
Is that the explaination Mike?

Sascha

Am 01.07.2008 um 10:52 schrieb Michael McCandless:
By "does not help" do you mean CheckIndex never detects thiscorruption, yet you then hit that exception when searching?
By "reopening fails" what do you mean? I thought reopen worksfine, but then it's only the search that fails?
Mike

Sascha Fahl wrote:
Checking the index after adding documents and befor reopeningthe IndexReader does not help. After adding documents nothingbad happens and CheckIndex says the index is all right. But whenI check the index before reopen itCheckIndex does not detect any corruption and says the index isok and reopening fails.
Sascha

Am 30.06.2008 um 18:34 schrieb Michael McCandless:
This is spooky: that exception means you have some sort ofindex corruption. The TermScorer thinks it found a doc ID37389, which is out of bounds.
Reopening IndexReader while IndexWriter is writing should becompletely fine.
Is this easily reproduced? If so, if you could narrow it downto sequence of added documents, that'd be awesome.
It's very strange that you see the corruption go away. Can yourun CheckIndex (java org.apache.lucene.index.CheckIndex<indexDir>) to see if it detects any corruption. In fact, ifyou could run CheckIndex after each session of IndexWriter toisolate which batch of added documents causes the corruption,that could help us narrow it down.
Are you changing any of the settings in IndexWriter? Are youusing multiple threads? Which exact JRE version and OS are youusing? Are you creating a new index at the start of each run?
Mike

Sascha Fahl wrote:
Hi,

I see some strange behavoiur of lucene. The following scenario.
While adding documents to my index (every doc is pretty small,doc-count is about 12000) I have implemented a custombehaviour of flushing and committing documents to the index.Before adding documents to the index I check if wether derramDocCount has reached a certain number of if the last commitis a while ago. If so i flush the buffered documents andreopen the IndexWriter. So far, so good. Indexing works verywell. The problem is that if I send requests with dieIndexReader while writing documents with the IndexWriter (Isend around 10.000 requests to lucene) I reopen theIndexReader every 100 requests (only for testing) if theIndexReader is not current. The first around 4000 requestswork very well, but afterwards I always get the followingexception:
java.lang.ArrayIndexOutOfBoundsException: 37389
at org.apache.lucene.search.TermScorer.score(TermScorer.java:126)atorg.apache.lucene.util.ScorerDocQueue.topScore(ScorerDocQueue.java:112)atorg.apache.lucene.search.DisjunctionSumScorer.advanceAfterCurrent(DisjunctionSumScorer.java:172)atorg.apache.lucene.search.DisjunctionSumScorer.next(DisjunctionSumScorer.java:146)atorg.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:319)atorg.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:146)atorg.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:113)
        at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:100)
        at org.apache.lucene.search.Hits.<init>(Hits.java:67)
        at org.apache.lucene.search.Searcher.search(Searcher.java:46)
        at org.apache.lucene.search.Searcher.search(Searcher.java:38)
This seems to be a temporarily problem because opening a newIndexReader after all documents were added everything is okagain and the 10.000 requests are all right.
So what could be the problem here?

reg,
sascha

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: java-user-[EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Sascha Fahl
Softwareenticklung

evenity GmbH
Zu den Mühlen 19
D-35390 Gießen

Mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Problems with reopening IndexReader while pushing documents to the index

Reply via email to