Hi Michael,

On 25/03/10 18:45, Michael McCandless wrote:
Hi Renaud,

It's great that you're pushing flex forward so much :) You're making
some cool sounding codecs!  I'm really looking forward to seeing
indexing/searching performance results on Wikipedia...
I'll share them for sure whenever the results are ready ;o).
It sounds most likely there's a bug in the PFor impl? (Since you don't
hit this exception with the others...).
It seems so, but I found strange also that I cannot reproduce it with synthetic data.
During merge, each segment's docIDs are rebased according to how many
non-deleted docs there are in all prior segments.  One possibility
here is a given segment thought it had N deletions but in fact
encountered fewer than N while iterating its docs.  This would cause
the next segment to have too-low a base which can cause this exact
exception on crossing from one segment to the next.  (Ie the very
first doc of the next segment will suddenly be<= prior doc(s)).

But... if that's happening (ie, bug is in Lucene not in PFor impl),
you'd expect the other codecs to hit it too.

Are you using multiple threads for indexing?  Are you also mixing in
deletions (or updateDocument calls)?
There is no deletion, I just create the index from scratch, and each document I am adding as a unique identifier. I am using one single thread for indexing: reading sequentially the list of wikipedia articles, putting the content into a single field, and add the document to the index. Commit is done every 10K documents. I have tried with different mergeFactors (2, or 20), but whenever the first merge occurs, I got this CorruptIndexException.

I will try to continue to debug, but if I could have at least the faulty segment, and the faulty term (or even better, the index of the faulty block), I will be able to display the content of the blocks, and see if there is some problems in the PFor encoding.

Cheers,
--
Renaud Delbru

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to