So you have a segment (_tej) with 22201 docs, all but 30 of which are deleted, and somehow one of the posting lists in _tej.frq is referencing an out-of-bound docID 34950. Odd...
Are you sure the IO system doesn't have any consistency issues? What environment are you running on (machine, OS, filesystem, JVM)? You could re-run CheckIndex with -fix to remove that one problematic segment (you'd lose the 30 docs in it though). Mike Brian Whitman <br...@echonest.com> wrote: > Here's checkindex: > > NOTE: testing will be more thorough if you run java with > '-ea:org.apache.lucene', so assertions are enabled > > Opening index @ /vol/solr/data/index/ > > Segments file=segments_vxx numSegments=8 version=FORMAT_HAS_PROX [Lucene > 2.4] > 1 of 8: name=_ks4 docCount=2504982 > compound=false > hasProx=true > numFiles=11 > size (MB)=3,965.695 > no deletions > test: open reader.........OK > test: fields, norms.......OK [343 fields] > test: terms, freq, prox...OK [37238560 terms; 161527224 terms/docs > pairs; 186273362 tokens] > test: stored fields.......OK [55813402 total field count; avg 22.281 > fields per doc] > test: term vectors........OK [7998458 total vector count; avg 3.193 > term/freq vector fields per doc] > > 2 of 8: name=_oaw docCount=514635 > compound=false > hasProx=true > numFiles=12 > size (MB)=746.887 > has deletions [delFileName=_oaw_1rb.del] > test: open reader.........OK [155528 deleted docs] > test: fields, norms.......OK [172 fields] > test: terms, freq, prox...OK [7396227 terms; 28146962 terms/docs pairs; > 17298364 tokens] > test: stored fields.......OK [5736012 total field count; avg 15.973 > fields per doc] > test: term vectors........OK [1045176 total vector count; avg 2.91 > term/freq vector fields per doc] > > 3 of 8: name=_tll docCount=827949 > compound=false > hasProx=true > numFiles=12 > size (MB)=761.782 > has deletions [delFileName=_tll_2fs.del] > test: open reader.........OK [39283 deleted docs] > test: fields, norms.......OK [180 fields] > test: terms, freq, prox...OK [10925397 terms; 43361019 terms/docs pairs; > 42123294 tokens] > test: stored fields.......OK [8673255 total field count; avg 10.997 > fields per doc] > test: term vectors........OK [880272 total vector count; avg 1.116 > term/freq vector fields per doc] > > 4 of 8: name=_tdx docCount=18372 > compound=false > hasProx=true > numFiles=12 > size (MB)=56.856 > has deletions [delFileName=_tdx_9.del] > test: open reader.........OK [18368 deleted docs] > test: fields, norms.......OK [50 fields] > test: terms, freq, prox...OK [261974 terms; 2018842 terms/docs pairs; > 150 tokens] > test: stored fields.......OK [76 total field count; avg 19 fields per > doc] > test: term vectors........OK [14 total vector count; avg 3.5 term/freq > vector fields per doc] > > 5 of 8: name=_te8 docCount=19929 > compound=false > hasProx=true > numFiles=12 > size (MB)=60.475 > has deletions [delFileName=_te8_a.del] > test: open reader.........OK [19900 deleted docs] > test: fields, norms.......OK [72 fields] > test: terms, freq, prox...OK [276045 terms; 2166958 terms/docs pairs; > 1196 tokens] > test: stored fields.......OK [522 total field count; avg 18 fields per > doc] > test: term vectors........OK [132 total vector count; avg 4.552 > term/freq vector fields per doc] > > 6 of 8: name=_tej docCount=22201 > compound=false > hasProx=true > numFiles=12 > size (MB)=65.827 > has deletions [delFileName=_tej_o.del] > test: open reader.........OK [22171 deleted docs] > test: fields, norms.......OK [50 fields] > test: terms, freq, prox...FAILED > WARNING: would remove reference to this segment (-fix was not > specified); full exception: > java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 34950 > at org.apache.lucene.util.BitVector.get(BitVector.java:91) > at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:125) > at > > org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:98) > at org.apache.lucene.index.CheckIndex.check(CheckIndex.java:222) > at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:433) > > 7 of 8: name=_1agw docCount=1717926 > compound=false > hasProx=true > numFiles=12 > size (MB)=2,390.413 > has deletions [delFileName=_1agw_1.del] > test: open reader.........OK [1 deleted docs] > test: fields, norms.......OK [438 fields] > test: terms, freq, prox...OK [20959015 terms; 101603282 terms/docs > pairs; 123561985 tokens] > test: stored fields.......OK [26248407 total field count; avg 15.279 > fields per doc] > test: term vectors........OK [4911368 total vector count; avg 2.859 > term/freq vector fields per doc] > > 8 of 8: name=_1agz docCount=1 > compound=false > hasProx=true > numFiles=8 > size (MB)=0 > no deletions > test: open reader.........OK > test: fields, norms.......OK [6 fields] > test: terms, freq, prox...OK [6 terms; 6 terms/docs pairs; 6 tokens] > test: stored fields.......OK [6 total field count; avg 6 fields per doc] > test: term vectors........OK [0 total vector count; avg 0 term/freq > vector fields per doc] > > WARNING: 1 broken segments detected > WARNING: 30 documents would be lost if -fix were specified > > NOTE: would write new segments file [-fix was not specified] > > > > On Fri, Jan 2, 2009 at 3:47 PM, Brian Whitman <br...@echonest.com> wrote: > > > I will but I bet I can guess what happened -- this index has many > > duplicates in it as well (same uniqueKey id multiple times) - this > happened > > to us once before and it was because the solr server went down during an > > add. We may have to re-index, but I will run checkIndex now. Thanks > > (Thread for dupes here : > > > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200803.mbox/%3c4ed8c459-1b0f-41cc-986c-4ffceef82...@variogr.am%3e > ) > > > > > > On Fri, Jan 2, 2009 at 3:44 PM, Michael McCandless < > > luc...@mikemccandless.com> wrote: > > > >> It looks like your index has some kind of corruption. Were there any > >> other > >> exceptions prior to this one, or, any previous problems with the OS/IO > >> system? > >> > >> Can you run CheckIndex (java org.apache.lucene.index.CheckIndex to see > >> usage) and post the output? > >> Mike > >> > >> Brian Whitman <br...@echonest.com> wrote: > >> > >> > I am getting this on a 10GB index (via solr 1.3) during an optimize: > >> > Jan 2, 2009 6:51:52 PM org.apache.solr.common.SolrException log > >> > SEVERE: java.io.IOException: background merge hit exception: > >> _ks4:C2504982 > >> > _oaw:C514635 _tll:C827949 _tdx:C18372 _te8:C19929 _tej:C22201 > >> > _1agw:C1717926 > >> > _1agz:C1 into _1ah2 [optimize] > >> > at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2346) > >> > at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2280) > >> > at > >> > > >> > > >> > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:355) > >> > at > >> > > >> > > >> > org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77) > >> > ... > >> > > >> > Exception in thread "Lucene Merge Thread #2" > >> > org.apache.lucene.index.MergePolicy$MergeException: > >> > java.lang.ArrayIndexOutOfBoundsException: Array index out of range: > >> 34950 > >> > at > >> > > >> > > >> > org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:314) > >> > at > >> > > >> > > >> > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291) > >> > Caused by: java.lang.ArrayIndexOutOfBoundsException: Array index out > of > >> > range: 34950 > >> > at org.apache.lucene.util.BitVector.get(BitVector.java:91) > >> > at > >> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:125) > >> > at > >> > > >> > > >> > org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:98) > >> > ... > >> > > >> > > >> > Does anyone know how this is caused and how I can fix it? It happens > >> with > >> > every optimize. Commits were very slow on this index as well (40x as > >> slow > >> > as > >> > a similar index on another machine) I have plenty of disk space (many > >> 100s > >> > of GB) free. > >> > > >> > > > > >