Re: Corrupt Index with IndexWriter.addIndexes(IndexReader readers[])

Michael McCandless Thu, 07 Nov 2013 02:59:01 -0800

OK, so CheckIndex found that the del files for 3 segments could not be
found, e.g. it wanted to open _24xf_9l.del (yet it's _24xf_9k.del
that's actually there).


I wonder why CheckIndex doesn't report the exc you saw in flush, with
that way-future segment (_33gg.cfs): that's weird.

But ... I suspect you may be hitting
https://issues.apache.org/jira/browse/LUCENE-3418 -- that issues
causes IW.commit() to not actually "work", so that if you commit
successfully and then there's power loss / OS crash, you could lose
files.  But you said there was no known power loss / crash?

It's also odd that you have two very different segments_N files in the index:

01/11/2013  03:00 AM             2,589 segments_29tx
02/11/2013  01:06 AM             2,369 segments_2bsy

And CheckIndex opened the newer one; maybe try temporarily moving that
new one (_2bsy) out of the way and then see if the index is intact
(this is a long shot ... it's really weird that you have that much
older segments still there).

Is there any replication involved here, besides addIndexes?  Ie,
anything that directly copies files into the index?

Mike McCandless

http://blog.mikemccandless.com


On Thu, Nov 7, 2013 at 4:32 AM, Gili Nachum <gilinac...@gmail.com> wrote:
> Thanks Mike and Uwe.
> I already reindexed in production, my goal is to get to the root cause to
> make sure it doesn't happen again.
> Will remove the flush(). No idea why it's there.
> Attaching checkIndex.Main() output (why did I bother writing my own output
> :#)
>
> *Output:*
> Opening index @ C:\\customers\\SC\\corrupt catalog index E3 -
> Nov\\WDPP29715_ap03\\opt\\WAS\\LotusConnections\\Data\\catalog\\index\\Places\\index
>
> Segments file=segments_2bsy numSegments=10 version=FORMAT_3_1 [Lucene 3.1]
> userData={VERSION=9460}
>   1 of 10: name=_9n docCount=4141
>     compound=false
>     hasProx=true
>     numFiles=9
>     size (MB)=3.89
>     diagnostics = {os.arch=amd64, java.vendor=IBM Corporation, os=Linux,
> mergeFactor=10, source=merge, java.version=1.6.0, lucene.version=3.1.0
> 1085809 - 2011-03-26 17:59:57, optimize=false,
> os.version=2.6.18-348.12.1.el5}
>     has deletions [delFileName=_9n_2t.del]
>     test: open reader.........OK [209 deleted docs]
>     test: fields..............OK [27 fields]
>     test: field norms.........OK [4 fields]
>     test: terms, freq, prox...OK [82966 terms; 295120 terms/docs pairs;
> 300750 tokens]
>     test: stored fields.......OK [72872 total field count; avg 18.533
> fields per doc]
>     test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
>   2 of 10: name=_9o docCount=19999
>     compound=false
>     hasProx=true
>     numFiles=9
>     size (MB)=21.487
>     diagnostics = {os.arch=amd64, java.vendor=IBM Corporation, os=Linux,
> mergeFactor=10, source=merge, java.version=1.6.0, lucene.version=3.1.0
> 1085809 - 2011-03-26 17:59:57, optimize=false,
> os.version=2.6.18-348.12.1.el5}
>     has deletions [delFileName=_9o_lg.del]
>     test: open reader.........OK [1396 deleted docs]
>     test: fields..............OK [27 fields]
>     test: field norms.........OK [4 fields]
>     test: terms, freq, prox...OK [318090 terms; 1773898 terms/docs pairs;
> 1888318 tokens]
>     test: stored fields.......OK [390466 total field count; avg 20.989
> fields per doc]
>     test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
>   3 of 10: name=_6k docCount=2000
>     compound=true
>     hasProx=true
>     numFiles=2
>     size (MB)=2.386
>     diagnostics = {os.arch=amd64, java.vendor=IBM Corporation, os=Linux,
> source=addIndexes(IndexReader...), java.version=1.6.0, lucene.version=3.1.0
> 1085809 - 2011-03-26 17:59:57, os.version=2.6.18-348.12.1.el5}
>     has deletions [delFileName=_6k_62.del]
>     test: open reader.........OK [389 deleted docs]
>     test: fields..............OK [27 fields]
>     test: field norms.........OK [4 fields]
>     test: terms, freq, prox...OK [46699 terms; 193013 terms/docs pairs;
> 178450 tokens]
>     test: stored fields.......OK [35965 total field count; avg 22.325
> fields per doc]
>     test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
>   4 of 10: name=_6l docCount=2000
>     compound=true
>     hasProx=true
>     numFiles=2
>     size (MB)=2.477
>     diagnostics = {os.arch=amd64, java.vendor=IBM Corporation, os=Linux,
> source=addIndexes(IndexReader...), java.version=1.6.0, lucene.version=3.1.0
> 1085809 - 2011-03-26 17:59:57, os.version=2.6.18-348.12.1.el5}
>     has deletions [delFileName=_6l_hx.del]
>     test: open reader.........OK [864 deleted docs]
>     test: fields..............OK [27 fields]
>     test: field norms.........OK [4 fields]
>     test: terms, freq, prox...OK [55730 terms; 196164 terms/docs pairs;
> 117213 tokens]
>     test: stored fields.......OK [23202 total field count; avg 20.424
> fields per doc]
>     test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
>   5 of 10: name=_ug3 docCount=2949
>     compound=false
>     hasProx=true
>     numFiles=9
> FAILED
>     WARNING: fixIndex() would remove reference to this segment; full
> exception:
> java.io.FileNotFoundException: _ug3_10h.del
>     at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:292)
>     at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:299)
>     at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:446)
>     at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:898)
>
>   6 of 10: name=_1qro docCount=2701
>     compound=false
>     hasProx=true
>     numFiles=9
>     size (MB)=3.478
>     diagnostics = {os.arch=amd64, java.vendor=IBM Corporation, os=Linux,
> mergeFactor=10, source=merge, java.version=1.6.0, lucene.version=3.1.0
> 1085809 - 2011-03-26 17:59:57, optimize=false,
> os.version=2.6.18-348.16.1.el5}
>     has deletions [delFileName=_1qro_p0.del]
>     test: open reader.........OK [1473 deleted docs]
>     test: fields..............OK [30 fields]
>     test: field norms.........OK [4 fields]
>     test: terms, freq, prox...OK [76625 terms; 278909 terms/docs pairs;
> 143954 tokens]
>     test: stored fields.......OK [25932 total field count; avg 21.117
> fields per doc]
>     test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
>   7 of 10: name=_24xf docCount=1645
>     compound=true
>     hasProx=true
>     numFiles=2
> FAILED
>     WARNING: fixIndex() would remove reference to this segment; full
> exception:
> java.io.FileNotFoundException: _24xf_9l.del
>     at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:292)
>     at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:299)
>     at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:446)
>     at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:898)
>
>   8 of 10: name=_2czq docCount=681
>     compound=true
>     hasProx=true
>     numFiles=2
>     size (MB)=0.947
>     diagnostics = {os.arch=amd64, java.vendor=IBM Corporation, os=Linux,
> mergeFactor=10, source=merge, java.version=1.6.0, lucene.version=3.1.0
> 1085809 - 2011-03-26 17:59:57, optimize=false,
> os.version=2.6.18-348.16.1.el5}
>     has deletions [delFileName=_2czq_3.del]
>     test: open reader.........OK [465 deleted docs]
>     test: fields..............OK [30 fields]
>     test: field norms.........OK [4 fields]
>     test: terms, freq, prox...OK [25569 terms; 72519 terms/docs pairs;
> 21076 tokens]
>     test: stored fields.......OK [4328 total field count; avg 20.037 fields
> per doc]
>     test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
>   9 of 10: name=_2d00 docCount=1997
>     compound=true
>     hasProx=true
>     numFiles=2
>     size (MB)=2.564
>     diagnostics = {os.arch=amd64, java.vendor=IBM Corporation, os=Linux,
> source=addIndexes(IndexReader...), java.version=1.6.0, lucene.version=3.1.0
> 1085809 - 2011-03-26 17:59:57, os.version=2.6.18-371.el5}
>     has deletions [delFileName=_2d00_1.del]
>     test: open reader.........OK [1 deleted docs]
>     test: fields..............OK [31 fields]
>     test: field norms.........OK [4 fields]
>     test: terms, freq, prox...OK [57448 terms; 204959 terms/docs pairs;
> 241754 tokens]
>     test: stored fields.......OK [44545 total field count; avg 22.317
> fields per doc]
>     test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
>   10 of 10: name=_2l3x docCount=865
>     compound=true
>     hasProx=true
>     numFiles=1
> FAILED
>     WARNING: fixIndex() would remove reference to this segment; full
> exception:
> java.io.FileNotFoundException: _2l3x.cfs
>     at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:292)
>     at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:299)
>     at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:446)
>     at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:898)
>
> WARNING: 3 broken segments (containing 5459 documents) detected
> WARNING: would write new segments file, and 5459 documents would be lost,
> if -fix were specified
>
>
> On Wed, Nov 6, 2013 at 11:07 PM, Uwe Schindler <u...@thetaphi.de> wrote:
>
>> Hi,
>> > Hello,
>> > I got an index corruption in production, and was wondering if it might
>> be a
>> > known bug (still with Lucene 3.1), or is my code doing something wrong.
>> > It's a local disk index. No known machine power lose. No suppose to even
>> > happen, right?
>> >
>> > This index that got corrupted is updated every 30sec; adding to it a
>> small
>> > delta's index (using addIndexes()) that was replicated from another
>> machine.
>> > The series of writer actions to update the index is:
>> > 1. writer.deleteDocuments(q);
>> > 2. writer.flush(false, true);
>> > 3. writer.addIndexes(reader);
>> > 4. writer.commit(map);
>> >
>> > Is the index exposed to corruptions only during commit, or is
>> addIndexes()
>> > risky by itself (doc says it's not).
>> > LUCENE-2610 <https://issues.apache.org/jira/browse/LUCENE-2610> kind of
>> > looks in the neberhood, though it's not a bug report.
>>
>> Hi, LUCENE-2610 is completely unrelated, as this only affects
>> addIndexes(Directory...), not addIndexes(IndexReader...). The one you are
>> using is using the natural Lucene merging as it is done all the time while
>> indexing documents (Lucene internally uses the same code like
>> addIndexes(IndexReader) to merge segments). addIndexes(Directory) is very
>> different and more risky to have bugs in older Lucene versions (this one
>> copies index files around without touching them, but renaming them to have
>> new segment names - which is somehow "unnatural"; it also does not
>> correctly lock the index directory in older versions).
>>
>> Why do you call flush() at all? I would leave that out, there is no reason
>> to do this from userland code.
>>
>> To "repair" the index, use Checkindex command line tool with the "fix"
>> option. This will delete the segment that is missing (_33gg.cfs). Of course
>> this data is lost, but as the file is not there it is lost already. This
>> will just remove the metadata of this missing segment from your index. But
>> before doing this, you should check what checkindex prints out without fix
>> option - the info you posted is not the console output the tool prints when
>> ran from command line and run with assertions enabled (-ea JVM option). The
>> output looks like the "toString) of the Java API of CheckIndex class, which
>> is not so helpful. Please post the full output of the tool executed from
>> command line:
>>         java -cp lucene-core-3.1.0.jar org.apache.lucene.index.CheckIndex
>> <options....>
>>
>> Uwe
>>
>> > I'll add an ls -l output in a follow up email.
>> >
>> > Technically the first indication of problems is when calling flush, but
>> it could
>> > be that the previous writer action left it broken for flush to fail.
>> > My stack trace is:
>> > Caused by: java.io.FileNotFoundException:
>> > /disks/data1/opt/WAS/LotusConnections/Data/catalog/index/Places/index/
>> > _33gg.cfs
>> > (No such file or directory)
>> >     at java.io.RandomAccessFile.open(Native Method)
>> >     at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
>> >     at
>> > org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.
>> > <init>(SimpleFSDirectory.java:69)
>> >     at
>> > org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.<init>(Simp
>> > leFSDirectory.java:90)
>> >     at
>> > org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.<init>(NIOFSDire
>> > ctory.java:91)
>> >     at
>> > org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78)
>> >     at
>> > org.apache.lucene.index.CompoundFileReader.<init>(CompoundFileReader.
>> > java:66)
>> >     at
>> > org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentRead
>> > er.java:113)
>> >     at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:578)
>> >     at
>> > org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:684)
>> >     at
>> > org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:659)
>> >     at
>> > org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.jav
>> > a:283)
>> >     at
>> > org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.jav
>> > a:191)
>> >     at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3358)
>> >     at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3296)
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Corrupt Index with IndexWriter.addIndexes(IndexReader readers[])

Reply via email to