Without the optimize, it looks like there are errors on all segments except
the first:

Opening index @ D:\mnsavs\lresumes1\lresumes1.luc\lresumes1.search.main.2

Segments file=segments_2 numSegments=3 version=FORMAT_DIAGNOSTICS [Lucene
2.9]
  1 of 3: name=_0 docCount=413557
    compound=false
    hasProx=true
    numFiles=8
    size (MB)=1,148.795
    diagnostics = {os.version=5.2, os=Windows 2003, lucene.version=2.9
exported - 2009-10-27 15:31:52, source=flush, os.arch=amd64,
java.version=1.6.0_16, java.vendor=Sun Microsystems Inc.}
    docStoreOffset=0
    docStoreSegment=_0
    docStoreIsCompoundFile=false
    no deletions
    test: open reader.........OK
    test: fields..............OK [33 fields]
    test: field norms.........OK [33 fields]
    test: terms, freq, prox...OK [7704599 terms; 180318249 terms/docs pairs;
340258711 tokens]
    test: stored fields.......OK [1240671 total field count; avg 3 fields
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]

  2 of 3: name=_1 docCount=359203
    compound=false
    hasProx=true
    numFiles=8
    size (MB)=1,125.103
    diagnostics = {os.version=5.2, os=Windows 2003, lucene.version=2.9
exported - 2009-10-27 15:31:52, source=flush, os.arch=amd64,
java.version=1.6.0_16, java.vendor=Sun Microsystems Inc.}
    docStoreOffset=413557
    docStoreSegment=_0
    docStoreIsCompoundFile=false
    no deletions
    test: open reader.........OK
    test: fields..............OK [33 fields]
    test: field norms.........OK [33 fields]
    test: terms, freq, prox...ERROR [term literals:cfid196$ docFreq=43 !=
num docs seen 4 + num docs deleted 0]
java.lang.RuntimeException: term literals:cfid196$ docFreq=43 != num docs
seen 4 + num docs deleted 0
    at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)
    at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
    at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
    test: stored fields.......OK [1077609 total field count; avg 3 fields
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]
FAILED
    WARNING: fixIndex() would remove reference to this segment; full
exception:
java.lang.RuntimeException: Term Index test failed
    at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
    at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)

  3 of 3: name=_2 docCount=304659
    compound=false
    hasProx=true
    numFiles=8
    size (MB)=961.764
    diagnostics = {os.version=5.2, os=Windows 2003, lucene.version=2.9
exported - 2009-10-27 15:31:52, source=flush, os.arch=amd64,
java.version=1.6.0_16, java.vendor=Sun Microsystems Inc.}
    docStoreOffset=772760
    docStoreSegment=_0
    docStoreIsCompoundFile=false
    no deletions
    test: open reader.........OK
    test: fields..............OK [33 fields]
    test: field norms.........OK [33 fields]
    test: terms, freq, prox...ERROR [term contents:? docFreq=1 != num docs
seen 245 + num docs deleted 0]
java.lang.RuntimeException: term contents:? docFreq=1 != num docs seen 245 +
num docs deleted 0
    at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)
    at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
    at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
    test: stored fields.......OK [913977 total field count; avg 3 fields per
doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]
FAILED
    WARNING: fixIndex() would remove reference to this segment; full
exception:
java.lang.RuntimeException: Term Index test failed
    at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
    at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)

WARNING: 2 broken segments (containing 663862 documents) detected
WARNING: would write new segments file, and 663862 documents would be lost,
if -fix were specified


Do the unit tests create multi-segment indexes?

Peter

On Tue, Oct 27, 2009 at 3:08 PM, Peter Keegan <peterlkee...@gmail.com>wrote:

> It's reproducible with a large no. of docs (>1 million), but not with 100K
> docs.
> I got same error with jvm 1.6.0_16.
> The index was optimized after all docs are added. I'll try removing the
> optimize.
>
> Peter
>
>
> On Tue, Oct 27, 2009 at 2:57 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> This is odd -- is it reproducible?
>>
>> Can you narrow it down to a small set of docs that when indexed
>> produce a corrupted index?
>>
>> If you attempt to optimize the index, does it fail?
>>
>> Mike
>>
>> On Tue, Oct 27, 2009 at 1:40 PM, Peter Keegan <peterlkee...@gmail.com>
>> wrote:
>> > It seems the index is corrupted immediately after the initial build
>> (ample
>> > disk space was provided):
>> >
>> > Output from CheckIndex:
>> >
>> > Opening index @
>> D:\mnsavs\lresumes1\lresumes1.luc\lresumes1.search.main.2
>> >
>> > Segments file=segments_3 numSegments=1 version=FORMAT_DIAGNOSTICS
>> [Lucene
>> > 2.9]
>> >  1 of 1: name=_7 docCount=1077025
>> >    compound=false
>> >    hasProx=true
>> >    numFiles=8
>> >    size (MB)=3,201.196
>> >    diagnostics = {optimize=true, mergeFactor=7, os.version=5.2,
>> os=Windows
>> > 2003, mergeDocStores=false, lucene.version=2.9 exported - 2009-10-26
>> > 07:58:55, source=merge, os.arch=amd64, java.version=1.6.0_03,
>> > java.vendor=Sun Microsystems Inc.}
>> >    docStoreOffset=0
>> >    docStoreSegment=_0
>> >    docStoreIsCompoundFile=false
>> >    no deletions
>> >    test: open reader.........OK
>> >    test: fields..............OK [33 fields]
>> >    test: field norms.........OK [33 fields]
>> >    test: terms, freq, prox...ERROR [term contents:? docFreq=1 != num
>> docs
>> > seen 482 + num docs deleted 0]
>> > java.lang.RuntimeException: term contents:? docFreq=1 != num docs seen
>> 482 +
>> > num docs deleted 0
>> >    at
>> org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)
>> >    at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
>> >    at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
>> >    test: stored fields.......OK [3231075 total field count; avg 3 fields
>> > per doc]
>> >    test: term vectors........OK [0 total vector count; avg 0 term/freq
>> > vector fields per doc]
>> > FAILED
>> >    WARNING: fixIndex() would remove reference to this segment; full
>> > exception:
>> > java.lang.RuntimeException: Term Index test failed
>> >    at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
>> >    at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
>> >
>> > WARNING: 1 broken segments (containing 1077025 documents) detected
>> > WARNING: would write new segments file, and 1077025 documents would be
>> lost,
>> > if -fix were specified
>> >
>> > Searching on this index seems to be fine, though.
>> >
>> > Here is the IndexWriter log from the build:
>> >
>> > IFD [Indexer]: setInfoStream
>> >
>> deletionpolicy=org.apache.lucene.index.keeponlylastcommitdeletionpol...@2a9cfec1
>> > IW 0 [Indexer]: setInfoStream:
>> > dir=org.apache.lucene.store.simplefsdirect...@d
>> :\mnsavs\lresumes1\lresumes1.luc\lresumes1.update.main.2
>> > autoCommit=false
>> >
>> mergepolicy=org.apache.lucene.index.logbytesizemergepol...@291946c2mergescheduler
>> =org.apache.lucene.index.concurrentmergeschedu...@3a747fa2rambuffersizemb
>> =16.0
>> > maxBufferedDocs=-1 maxBuffereDeleteTerms=-1
>> > maxFieldLength=2147483647 index=
>> > IW 0 [Indexer]: setRAMBufferSizeMB 910.25
>> > IW 0 [Indexer]: setMaxBufferedDocs 1000000
>> > IW 0 [Indexer]: flush at getReader
>> > IW 0 [Indexer]:   flush: segment=null docStoreSegment=null
>> docStoreOffset=0
>> > flushDocs=false flushDeletes=true flushDocStores=false numDocs=0
>> > numBufDelTerms=0
>> > IW 0 [Indexer]:   index before flush
>> > IW 0 [UpdWriterBuild]: DW:   RAM: now flush @ usedMB=886.463
>> allocMB=886.463
>> > deletesMB=23.803 triggerMB=910.25
>> > IW 0 [UpdWriterBuild]:   flush: segment=_0 docStoreSegment=_0
>> > docStoreOffset=0 flushDocs=true flushDeletes=false flushDocStores=false
>> > numDocs=171638 numBufDelTerms=171638
>> > IW 0 [UpdWriterBuild]:   index before flush
>> > IW 0 [UpdWriterBuild]: DW: flush postings as segment _0 numDocs=171638
>> > IW 0 [UpdWriterBuild]: DW:   oldRAMSize=929523712
>> newFlushedSize=573198529
>> > docs/MB=313.985 new/old=61.666%
>> > IFD [UpdWriterBuild]: now checkpoint "segments_1" [1 segments ; isCommit
>> =
>> > false]
>> > IFD [UpdWriterBuild]: now checkpoint "segments_1" [1 segments ; isCommit
>> =
>> > false]
>> > IW 0 [UpdWriterBuild]: LMP: findMerges: 1 segments
>> > IW 0 [UpdWriterBuild]: LMP:   level 8.008305 to 8.758305: 1 segments
>> > IW 0 [UpdWriterBuild]: CMS: now merge
>> > IW 0 [UpdWriterBuild]: CMS:   index: _0:C171638->_0
>> > IW 0 [UpdWriterBuild]: CMS:   no more merges pending; now return
>> > IW 0 [UpdWriterBuild]: DW:   RAM: now flush @ usedMB=857.977
>> allocMB=901.32
>> > deletesMB=52.274 triggerMB=910.25
>> > IW 0 [UpdWriterBuild]:   flush: segment=_1 docStoreSegment=_0
>> > docStoreOffset=171638 flushDocs=true flushDeletes=false
>> flushDocStores=false
>> > numDocs=204995 numBufDelTerms=204995
>> > IW 0 [UpdWriterBuild]:   index before flush _0:C171638->_0
>> > IW 0 [UpdWriterBuild]: DW: flush postings as segment _1 numDocs=204995
>> > IW 0 [UpdWriterBuild]: DW:   oldRAMSize=899653632
>> newFlushedSize=544283851
>> > docs/MB=394.928 new/old=60.499%
>> > IFD [UpdWriterBuild]: now checkpoint "segments_1" [2 segments ; isCommit
>> =
>> > false]
>> > IFD [UpdWriterBuild]: now checkpoint "segments_1" [2 segments ; isCommit
>> =
>> > false]
>> > IW 0 [UpdWriterBuild]: LMP: findMerges: 2 segments
>> > IW 0 [UpdWriterBuild]: LMP:   level 8.008305 to 8.758305: 2 segments
>> > IW 0 [UpdWriterBuild]: CMS: now merge
>> > IW 0 [UpdWriterBuild]: CMS:   index: _0:C171638->_0 _1:C204995->_0
>> > IW 0 [UpdWriterBuild]: CMS:   no more merges pending; now return
>> > IW 0 [UpdWriterBuild]: DW:   RAM: now balance allocations:
>> usedMB=834.645 vs
>> > trigger=910.25 allocMB=901.32 deletesMB=75.627 vs trigger=955.762
>> > byteBlockFree=35.938 charBlockFree=8.938
>> > IW 0 [UpdWriterBuild]: DW:     nothing to free
>> > IW 0 [UpdWriterBuild]: DW:     after free: freedMB=66.613 usedMB=910.272
>> > allocMB=834.707
>> > IW 0 [UpdWriterBuild]:   flush: segment=_2 docStoreSegment=_0
>> > docStoreOffset=376633 flushDocs=true flushDeletes=false
>> flushDocStores=false
>> > numDocs=168236 numBufDelTerms=168236
>> > IW 0 [UpdWriterBuild]:   index before flush _0:C171638->_0
>> _1:C204995->_0
>> > IW 0 [UpdWriterBuild]: DW: flush postings as segment _2 numDocs=168236
>> > IW 0 [UpdWriterBuild]: DW:   oldRAMSize=875188224
>> newFlushedSize=530720464
>> > docs/MB=332.394 new/old=60.641%
>> > IFD [UpdWriterBuild]: now checkpoint "segments_1" [3 segments ; isCommit
>> =
>> > false]
>> > IFD [UpdWriterBuild]: now checkpoint "segments_1" [3 segments ; isCommit
>> =
>> > false]
>> > IW 0 [UpdWriterBuild]: LMP: findMerges: 3 segments
>> > IW 0 [UpdWriterBuild]: LMP:   level 8.008305 to 8.758305: 3 segments
>> > IW 0 [UpdWriterBuild]: CMS: now merge
>> > IW 0 [UpdWriterBuild]: CMS:   index: _0:C171638->_0 _1:C204995->_0
>> > _2:C168236->_0
>> > IW 0 [UpdWriterBuild]: CMS:   no more merges pending; now return
>> > IW 0 [UpdWriterBuild]: DW:   RAM: now flush @ usedMB=814.282
>> allocMB=835.832
>> > deletesMB=95.997 triggerMB=910.25
>> > IW 0 [UpdWriterBuild]:   flush: segment=_3 docStoreSegment=_0
>> > docStoreOffset=544869 flushDocs=true flushDeletes=false
>> flushDocStores=false
>> > numDocs=146894 numBufDelTerms=146894
>> > IW 0 [UpdWriterBuild]:   index before flush _0:C171638->_0
>> _1:C204995->_0
>> > _2:C168236->_0
>> > IW 0 [UpdWriterBuild]: DW: flush postings as segment _3 numDocs=146894
>> > IW 0 [UpdWriterBuild]: DW:   oldRAMSize=853836800
>> newFlushedSize=522388771
>> > docs/MB=294.856 new/old=61.181%
>> > IFD [UpdWriterBuild]: now checkpoint "segments_1" [4 segments ; isCommit
>> =
>> > false]
>> > IFD [UpdWriterBuild]: now checkpoint "segments_1" [4 segments ; isCommit
>> =
>> > false]
>> > IW 0 [UpdWriterBuild]: LMP: findMerges: 4 segments
>> > IW 0 [UpdWriterBuild]: LMP:   level 8.008305 to 8.758305: 4 segments
>> > IW 0 [UpdWriterBuild]: CMS: now merge
>> > IW 0 [UpdWriterBuild]: CMS:   index: _0:C171638->_0 _1:C204995->_0
>> > _2:C168236->_0 _3:C146894->_0
>> > IW 0 [UpdWriterBuild]: CMS:   no more merges pending; now return
>> > IW 0 [UpdWriterBuild]: DW:   RAM: now flush @ usedMB=791.724
>> allocMB=835.832
>> > deletesMB=118.535 triggerMB=910.25
>> > IW 0 [UpdWriterBuild]:   flush: segment=_4 docStoreSegment=_0
>> > docStoreOffset=691763 flushDocs=true flushDeletes=false
>> flushDocStores=false
>> > numDocs=162034 numBufDelTerms=162034
>> > IW 0 [UpdWriterBuild]:   index before flush _0:C171638->_0
>> _1:C204995->_0
>> > _2:C168236->_0 _3:C146894->_0
>> > IW 0 [UpdWriterBuild]: DW: flush postings as segment _4 numDocs=162034
>> > IW 0 [UpdWriterBuild]: DW:   oldRAMSize=830182400
>> newFlushedSize=498741034
>> > docs/MB=340.668 new/old=60.076%
>> > IFD [UpdWriterBuild]: now checkpoint "segments_1" [5 segments ; isCommit
>> =
>> > false]
>> > IFD [UpdWriterBuild]: now checkpoint "segments_1" [5 segments ; isCommit
>> =
>> > false]
>> > IW 0 [UpdWriterBuild]: LMP: findMerges: 5 segments
>> > IW 0 [UpdWriterBuild]: LMP:   level 8.008305 to 8.758305: 5 segments
>> > IW 0 [UpdWriterBuild]: CMS: now merge
>> > IW 0 [UpdWriterBuild]: CMS:   index: _0:C171638->_0 _1:C204995->_0
>> > _2:C168236->_0 _3:C146894->_0 _4:C162034->_0
>> > IW 0 [UpdWriterBuild]: CMS:   no more merges pending; now return
>> > IW 0 [UpdWriterBuild]: DW:   RAM: now balance allocations:
>> usedMB=771.396 vs
>> > trigger=910.25 allocMB=835.832 deletesMB=138.875 vs trigger=955.762
>> > byteBlockFree=39.688 charBlockFree=7.188
>> > IW 0 [UpdWriterBuild]: DW:     nothing to free
>> > IW 0 [UpdWriterBuild]: DW:     after free: freedMB=64.374 usedMB=910.271
>> > allocMB=771.458
>> > IW 0 [UpdWriterBuild]:   flush: segment=_5 docStoreSegment=_0
>> > docStoreOffset=853797 flushDocs=true flushDeletes=false
>> flushDocStores=false
>> > numDocs=146250 numBufDelTerms=146250
>> > IW 0 [UpdWriterBuild]:   index before flush _0:C171638->_0
>> _1:C204995->_0
>> > _2:C168236->_0 _3:C146894->_0 _4:C162034->_0
>> > IW 0 [UpdWriterBuild]: DW: flush postings as segment _5 numDocs=146250
>> > IW 0 [UpdWriterBuild]: DW:   oldRAMSize=808866816
>> newFlushedSize=485212402
>> > docs/MB=316.056 new/old=59.987%
>> > IFD [UpdWriterBuild]: now checkpoint "segments_1" [6 segments ; isCommit
>> =
>> > false]
>> > IFD [UpdWriterBuild]: now checkpoint "segments_1" [6 segments ; isCommit
>> =
>> > false]
>> > IW 0 [UpdWriterBuild]: LMP: findMerges: 6 segments
>> > IW 0 [UpdWriterBuild]: LMP:   level 8.008305 to 8.758305: 6 segments
>> > IW 0 [UpdWriterBuild]: CMS: now merge
>> > IW 0 [UpdWriterBuild]: CMS:   index: _0:C171638->_0 _1:C204995->_0
>> > _2:C168236->_0 _3:C146894->_0 _4:C162034->_0 _5:C146250->_0
>> > IW 0 [UpdWriterBuild]: CMS:   no more merges pending; now return
>> > IW 0 [Indexer]: commit: start
>> > IW 0 [Indexer]: commit: now prepare
>> > IW 0 [Indexer]: prepareCommit: flush
>> > IW 0 [Indexer]:   flush: segment=_6 docStoreSegment=_0
>> > docStoreOffset=1000047 flushDocs=true flushDeletes=true
>> flushDocStores=true
>> > numDocs=76978 numBufDelTerms=76978
>> > IW 0 [Indexer]:   index before flush _0:C171638->_0 _1:C204995->_0
>> > _2:C168236->_0 _3:C146894->_0 _4:C162034->_0 _5:C146250->_0
>> > IW 0 [Indexer]:   flush shared docStore segment _0
>> > IW 0 [Indexer]: DW: closeDocStore: 2 files to flush to segment _0
>> > numDocs=1077025
>> > IW 0 [Indexer]: DW: flush postings as segment _6 numDocs=76978
>> > IW 0 [Indexer]: DW:   oldRAMSize=486968320 newFlushedSize=273168136
>> > docs/MB=295.486 new/old=56.096%
>> > IFD [Indexer]: now checkpoint "segments_1" [7 segments ; isCommit =
>> false]
>> > IW 0 [Indexer]: DW: apply 1077025 buffered deleted terms and 0 deleted
>> > docIDs and 0 deleted queries on 7 segments.
>> > IFD [Indexer]: now checkpoint "segments_1" [7 segments ; isCommit =
>> false]
>> > IW 0 [Indexer]: LMP: findMerges: 7 segments
>> > IW 0 [Indexer]: LMP:   level 8.008305 to 8.758305: 7 segments
>> > IW 0 [Indexer]: CMS: now merge
>> > IW 0 [Indexer]: CMS:   index: _0:C171638->_0 _1:C204995->_0
>> _2:C168236->_0
>> > _3:C146894->_0 _4:C162034->_0 _5:C146250->_0 _6:C76978->_0
>> > IW 0 [Indexer]: CMS:   no more merges pending; now return
>> > IW 0 [Indexer]: startCommit(): start sizeInBytes=0
>> > IW 0 [Indexer]: startCommit index=_0:C171638->_0 _1:C204995->_0
>> > _2:C168236->_0 _3:C146894->_0 _4:C162034->_0 _5:C146250->_0
>> _6:C76978->_0
>> > changeCount=21
>> > IW 0 [Indexer]: now sync _0.tis
>> > IW 0 [Indexer]: now sync _5.prx
>> > IW 0 [Indexer]: now sync _3.frq
>> > IW 0 [Indexer]: now sync _3.tii
>> > IW 0 [Indexer]: now sync _1.frq
>> > IW 0 [Indexer]: now sync _6.frq
>> > IW 0 [Indexer]: now sync _4.prx
>> > IW 0 [Indexer]: now sync _4.fnm
>> > IW 0 [Indexer]: now sync _2.tii
>> > IW 0 [Indexer]: now sync _3.fnm
>> > IW 0 [Indexer]: now sync _1.fnm
>> > IW 0 [Indexer]: now sync _6.tis
>> > IW 0 [Indexer]: now sync _4.frq
>> > IW 0 [Indexer]: now sync _5.nrm
>> > IW 0 [Indexer]: now sync _5.tis
>> > IW 0 [Indexer]: now sync _1.tii
>> > IW 0 [Indexer]: now sync _4.tis
>> > IW 0 [Indexer]: now sync _0.prx
>> > IW 0 [Indexer]: now sync _3.nrm
>> > IW 0 [Indexer]: now sync _4.tii
>> > IW 0 [Indexer]: now sync _0.nrm
>> > IW 0 [Indexer]: now sync _5.fnm
>> > IW 0 [Indexer]: now sync _1.tis
>> > IW 0 [Indexer]: now sync _0.fnm
>> > IW 0 [Indexer]: now sync _2.prx
>> > IW 0 [Indexer]: now sync _6.tii
>> > IW 0 [Indexer]: now sync _4.nrm
>> > IW 0 [Indexer]: now sync _2.frq
>> > IW 0 [Indexer]: now sync _5.frq
>> > IW 0 [Indexer]: now sync _3.prx
>> > IW 0 [Indexer]: now sync _5.tii
>> > IW 0 [Indexer]: now sync _2.fnm
>> > IW 0 [Indexer]: now sync _1.prx
>> > IW 0 [Indexer]: now sync _2.tis
>> > IW 0 [Indexer]: now sync _0.tii
>> > IW 0 [Indexer]: now sync _6.prx
>> > IW 0 [Indexer]: now sync _0.frq
>> > IW 0 [Indexer]: now sync _6.fnm
>> > IW 0 [Indexer]: now sync _0.fdx
>> > IW 0 [Indexer]: now sync _6.nrm
>> > IW 0 [Indexer]: now sync _0.fdt
>> > IW 0 [Indexer]: now sync _1.nrm
>> > IW 0 [Indexer]: now sync _2.nrm
>> > IW 0 [Indexer]: now sync _3.tis
>> > IW 0 [Indexer]: done all syncs
>> > IW 0 [Indexer]: commit: pendingCommit != null
>> > IW 0 [Indexer]: commit: wrote segments file "segments_2"
>> > IFD [Indexer]: now checkpoint "segments_2" [7 segments ; isCommit =
>> true]
>> > IFD [Indexer]: deleteCommits: now decRef commit "segments_1"
>> > IFD [Indexer]: delete "segments_1"
>> > IW 0 [Indexer]: commit: done
>> > IW 0 [Indexer]: optimize: index now _0:C171638->_0 _1:C204995->_0
>> > _2:C168236->_0 _3:C146894->_0 _4:C162034->_0 _5:C146250->_0
>> _6:C76978->_0
>> > IW 0 [Indexer]:   flush: segment=null docStoreSegment=_6
>> docStoreOffset=0
>> > flushDocs=false flushDeletes=true flushDocStores=false numDocs=0
>> > numBufDelTerms=0
>> > IW 0 [Indexer]:   index before flush _0:C171638->_0 _1:C204995->_0
>> > _2:C168236->_0 _3:C146894->_0 _4:C162034->_0 _5:C146250->_0
>> _6:C76978->_0
>> > IW 0 [Indexer]: add merge to pendingMerges: _0:C171638->_0
>> _1:C204995->_0
>> > _2:C168236->_0 _3:C146894->_0 _4:C162034->_0 _5:C146250->_0
>> _6:C76978->_0
>> > [optimize] [total 1 pending]
>> > IW 0 [Indexer]: CMS: now merge
>> > IW 0 [Indexer]: CMS:   index: _0:C171638->_0 _1:C204995->_0
>> _2:C168236->_0
>> > _3:C146894->_0 _4:C162034->_0 _5:C146250->_0 _6:C76978->_0
>> > IW 0 [Indexer]: CMS:   consider merge _0:C171638->_0 _1:C204995->_0
>> > _2:C168236->_0 _3:C146894->_0 _4:C162034->_0 _5:C146250->_0
>> _6:C76978->_0
>> > into _7 [optimize]
>> > IW 0 [Indexer]: CMS:     launch new thread [Lucene Merge Thread #0]
>> > IW 0 [Indexer]: CMS:   no more merges pending; now return
>> > IW 0 [Lucene Merge Thread #0]: CMS:   merge thread: start
>> > IW 0 [Lucene Merge Thread #0]: now merge
>> >  merge=_0:C171638->_0 _1:C204995->_0 _2:C168236->_0 _3:C146894->_0
>> > _4:C162034->_0 _5:C146250->_0 _6:C76978->_0 into _7 [optimize]
>> >  merge=org.apache.lucene.index.mergepolicy$oneme...@78688954
>> >  index=_0:C171638->_0 _1:C204995->_0 _2:C168236->_0 _3:C146894->_0
>> > _4:C162034->_0 _5:C146250->_0 _6:C76978->_0
>> > IW 0 [Lucene Merge Thread #0]: merging _0:C171638->_0 _1:C204995->_0
>> > _2:C168236->_0 _3:C146894->_0 _4:C162034->_0 _5:C146250->_0
>> _6:C76978->_0
>> > into _7 [optimize]
>> > IW 0 [Lucene Merge Thread #0]: merge: total 1077025 docs
>> > IW 0 [Lucene Merge Thread #0]: commitMerge: _0:C171638->_0
>> _1:C204995->_0
>> > _2:C168236->_0 _3:C146894->_0 _4:C162034->_0 _5:C146250->_0
>> _6:C76978->_0
>> > into _7 [optimize] index=_0:C171638->_0 _1:C204995->_0 _2:C168236->_0
>> > _3:C146894->_0 _4:C162034->_0 _5:C146250->_0 _6:C76978->_0
>> > IW 0 [Lucene Merge Thread #0]: commitMergeDeletes _0:C171638->_0
>> > _1:C204995->_0 _2:C168236->_0 _3:C146894->_0 _4:C162034->_0
>> _5:C146250->_0
>> > _6:C76978->_0 into _7 [optimize]
>> > IFD [Lucene Merge Thread #0]: now checkpoint "segments_2" [1 segments ;
>> > isCommit = false]
>> > IW 0 [Lucene Merge Thread #0]: CMS:   merge thread: done
>> > IW 0 [Indexer]: now flush at close
>> > IW 0 [Indexer]:   flush: segment=null docStoreSegment=_6
>> docStoreOffset=0
>> > flushDocs=false flushDeletes=true flushDocStores=true numDocs=0
>> > numBufDelTerms=0
>> > IW 0 [Indexer]:   index before flush _7:C1077025->_0
>> > IW 0 [Indexer]:   flush shared docStore segment _6
>> > IW 0 [Indexer]: DW: closeDocStore: 0 files to flush to segment _6
>> numDocs=0
>> > IW 0 [Indexer]: CMS: now merge
>> > IW 0 [Indexer]: CMS:   index: _7:C1077025->_0
>> > IW 0 [Indexer]: CMS:   no more merges pending; now return
>> > IW 0 [Indexer]: now call final commit()
>> > IW 0 [Indexer]: startCommit(): start sizeInBytes=0
>> > IW 0 [Indexer]: startCommit index=_7:C1077025->_0 changeCount=23
>> > IW 0 [Indexer]: now sync _7.prx
>> > IW 0 [Indexer]: now sync _7.fnm
>> > IW 0 [Indexer]: now sync _7.tis
>> > IW 0 [Indexer]: now sync _7.nrm
>> > IW 0 [Indexer]: now sync _7.tii
>> > IW 0 [Indexer]: now sync _7.frq
>> > IW 0 [Indexer]: done all syncs
>> > IW 0 [Indexer]: commit: pendingCommit != null
>> > IW 0 [Indexer]: commit: wrote segments file "segments_3"
>> > IFD [Indexer]: now checkpoint "segments_3" [1 segments ; isCommit =
>> true]
>> > IFD [Indexer]: deleteCommits: now decRef commit "segments_2"
>> > IFD [Indexer]: delete "_0.tis"
>> > IFD [Indexer]: delete "_5.prx"
>> > IFD [Indexer]: delete "_3.tii"
>> > IFD [Indexer]: delete "_3.frq"
>> > IFD [Indexer]: delete "_1.frq"
>> > IFD [Indexer]: delete "_6.frq"
>> > IFD [Indexer]: delete "_4.prx"
>> > IFD [Indexer]: delete "_4.fnm"
>> > IFD [Indexer]: delete "_2.tii"
>> > IFD [Indexer]: delete "_3.fnm"
>> > IFD [Indexer]: delete "_1.fnm"
>> > IFD [Indexer]: delete "_6.tis"
>> > IFD [Indexer]: delete "_4.frq"
>> > IFD [Indexer]: delete "_5.nrm"
>> > IFD [Indexer]: delete "_5.tis"
>> > IFD [Indexer]: delete "_1.tii"
>> > IFD [Indexer]: delete "_4.tis"
>> > IFD [Indexer]: delete "_0.prx"
>> > IFD [Indexer]: delete "_3.nrm"
>> > IFD [Indexer]: delete "_4.tii"
>> > IFD [Indexer]: delete "_0.nrm"
>> > IFD [Indexer]: delete "_5.fnm"
>> > IFD [Indexer]: delete "_1.tis"
>> > IFD [Indexer]: delete "_0.fnm"
>> > IFD [Indexer]: delete "_2.prx"
>> > IFD [Indexer]: delete "_6.tii"
>> > IFD [Indexer]: delete "_4.nrm"
>> > IFD [Indexer]: delete "_2.frq"
>> > IFD [Indexer]: delete "_5.frq"
>> > IFD [Indexer]: delete "_3.prx"
>> > IFD [Indexer]: delete "_5.tii"
>> > IFD [Indexer]: delete "_2.fnm"
>> > IFD [Indexer]: delete "_1.prx"
>> > IFD [Indexer]: delete "_2.tis"
>> > IFD [Indexer]: delete "_0.tii"
>> > IFD [Indexer]: delete "_6.prx"
>> > IFD [Indexer]: delete "_0.frq"
>> > IFD [Indexer]: delete "segments_2"
>> > IFD [Indexer]: delete "_6.fnm"
>> > IFD [Indexer]: delete "_6.nrm"
>> > IFD [Indexer]: delete "_1.nrm"
>> > IFD [Indexer]: delete "_2.nrm"
>> > IFD [Indexer]: delete "_3.tis"
>> > IW 0 [Indexer]: commit: done
>> > IW 0 [Indexer]: at close: _7:C1077025->_0
>> >
>> > I see no errors.
>> > Peter
>> >
>> >
>> > On Tue, Oct 27, 2009 at 10:44 AM, Peter Keegan <peterlkee...@gmail.com
>> >wrote:
>> >
>> >>
>> >>
>> >> On Tue, Oct 27, 2009 at 10:37 AM, Michael McCandless <
>> >> luc...@mikemccandless.com> wrote:
>> >>
>> >>> OK that exception looks more reasonable, for a disk full event.
>> >>>
>> >>> But, I can't tell from your followon emails: did this lead to index
>> >>> corruption?
>> >>>
>> >>
>> >> Yes, but this may be caused by the application ignoring a Lucene
>> exception
>> >> somewhere else. I will chase this down.
>> >>
>> >>>
>> >>> Also, I noticed you're using a rather old 1.6.0 JRE (1.6.0_03) -- you
>> >>> really should upgrade that to the latest 1.6.0 -- there's at least one
>> >>> known problem with Lucene and early 1.6.0 JREs.
>> >>>
>> >>
>> >> Yes, I remember this problem - that's why we stayed at _03
>> >> Thanks.
>> >>
>> >>>
>> >>> Mike
>> >>>
>> >>> On Tue, Oct 27, 2009 at 10:00 AM, Peter Keegan <
>> peterlkee...@gmail.com>
>> >>> wrote:
>> >>> > After rebuilding the corrupted indexes, the low disk space exception
>> is
>> >>> now
>> >>> > occurring as expected. Sorry for the distraction.
>> >>> >
>> >>> > fyi, here are the details:
>> >>> >
>> >>> >  java.io.IOException: There is not enough space on the disk
>> >>> >    at java.io.RandomAccessFile.writeBytes(Native Method)
>> >>> >    at java.io.RandomAccessFile.write(Unknown Source)
>> >>> >    at
>> >>> >
>> >>>
>> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:192)
>> >>> >    at
>> >>> >
>> >>>
>> org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)
>> >>> >    at
>> >>> >
>> >>>
>> org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85)
>> >>> >    at
>> >>> >
>> >>>
>> org.apache.lucene.store.BufferedIndexOutput.close(BufferedIndexOutput.java:109)
>> >>> >    at
>> >>> >
>> >>>
>> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.close(SimpleFSDirectory.java:199)
>> >>> >    at
>> org.apache.lucene.index.FieldsWriter.close(FieldsWriter.java:144)
>> >>> >    at
>> >>> >
>> >>>
>> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:357)
>> >>> >    at
>> >>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:153)
>> >>> >    at
>> >>> >
>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5011)
>> >>> >    at
>> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4596)
>> >>> >    at
>> >>> >
>> >>>
>> org.apache.lucene.index.IndexWriter.resolveExternalSegments(IndexWriter.java:3786)
>> >>> >    at
>> >>> >
>> >>>
>> org.apache.lucene.index.IndexWriter.addIndexesNoOptimize(IndexWriter.java:3695)
>> >>> >
>> >>> >
>> >>> > And the corresponding index info log:
>> >>> >
>> >>> > IFD [Indexer]: setInfoStream
>> >>> >
>> >>>
>> deletionpolicy=org.apache.lucene.index.keeponlylastcommitdeletionpol...@256ef705
>> >>> > IW 1 [Indexer]: setInfoStream:
>> >>> > dir=org.apache.lucene.store.simplefsdirect...@d
>> >>> :\mnsavs\lresumes3\lresumes3.luc\lresumes3.update.main.4
>> >>> > autoCommit=false
>> >>> >
>> >>>
>> mergepolicy=org.apache.lucene.index.logbytesizemergepol...@181b7c76mergescheduler
>> >>>
>> =org.apache.lucene.index.concurrentmergeschedu...@34883357rambuffersizemb
>> >>> =16.0
>> >>> > maxBufferedDocs=-1 maxBuffereDeleteTerms=-1
>> >>> > maxFieldLength=2147483647 index=
>> >>> > IW 1 [Indexer]: flush at addIndexesNoOptimize
>> >>> > IW 1 [Indexer]:   flush: segment=null docStoreSegment=null
>> >>> docStoreOffset=0
>> >>> > flushDocs=false flushDeletes=true flushDocStores=false numDocs=0
>> >>> > numBufDelTerms=0
>> >>> > IW 1 [Indexer]:   index before flush
>> >>> > IW 1 [Indexer]: now start transaction
>> >>> > IW 1 [Indexer]: LMP: findMerges: 2 segments
>> >>> > IW 1 [Indexer]: LMP:   level 8.774518 to 9.524518: 1 segments
>> >>> > IW 1 [Indexer]: LMP:   level 6.2973914 to 7.0473914: 1 segments
>> >>> > IW 1 [Indexer]: CMS: now merge
>> >>> > IW 1 [Indexer]: CMS:   index: _7:Cx1075533->_0** _8:Cx2795**
>> >>> > IW 1 [Indexer]: CMS:   no more merges pending; now return
>> >>> > IW 1 [Indexer]: add merge to pendingMerges: _7:Cx1075533->_0 [total
>> 1
>> >>> > pending]
>> >>> > IW 1 [Indexer]: now merge
>> >>> >  merge=_7:Cx1075533->_0 into _0 [mergeDocStores]
>> >>> >  merge=org.apache.lucene.index.mergepolicy$oneme...@4d480ea
>> >>> >  index=_7:Cx1075533->_0** _8:Cx2795**
>> >>> > IW 1 [Indexer]: merging _7:Cx1075533->_0 into _0 [mergeDocStores]
>> >>> > IW 1 [Indexer]: merge: total 1074388 docs
>> >>> > IW 1 [Indexer]: commitMerge: _7:Cx1075533->_0 into _0
>> [mergeDocStores]
>> >>> > index=_7:Cx1075533->_0** _8:Cx2795**
>> >>> > IW 1 [Indexer]: commitMergeDeletes _7:Cx1075533->_0 into _0
>> >>> [mergeDocStores]
>> >>> > IFD [Indexer]: now checkpoint "segments_1" [2 segments ; isCommit =
>> >>> false]
>> >>> > IW 1 [Indexer]: LMP: findMerges: 2 segments
>> >>> > IW 1 [Indexer]: LMP:   level 8.864886 to 9.614886: 1 segments
>> >>> > IW 1 [Indexer]: LMP:   level 6.2973914 to 7.0473914: 1 segments
>> >>> > IW 1 [Indexer]: add merge to pendingMerges: _8:Cx2795 [total 1
>> pending]
>> >>> > IW 1 [Indexer]: now merge
>> >>> >  merge=_8:Cx2795 into _1 [mergeDocStores]
>> >>> >  merge=org.apache.lucene.index.mergepolicy$oneme...@606f8b2b
>> >>> >  index=_0:C1074388 _8:Cx2795**
>> >>> > IW 1 [Indexer]: merging _8:Cx2795 into _1 [mergeDocStores]
>> >>> > IW 1 [Indexer]: merge: total 2795 docs
>> >>> > IW 1 [Indexer]: handleMergeException: merge=_8:Cx2795 into _1
>> >>> > [mergeDocStores] exc=java.io.IOException: There is not enough space
>> on
>> >>> the
>> >>> > disk
>> >>> > IW 1 [Indexer]: hit exception during merge
>> >>> > IFD [Indexer]: refresh [prefix=_1]: removing newly created
>> unreferenced
>> >>> file
>> >>> > "_1.fdt"
>> >>> > IFD [Indexer]: delete "_1.fdt"
>> >>> > IFD [Indexer]: refresh [prefix=_1]: removing newly created
>> unreferenced
>> >>> file
>> >>> > "_1.fdx"
>> >>> > IFD [Indexer]: delete "_1.fdx"
>> >>> > IFD [Indexer]: refresh [prefix=_1]: removing newly created
>> unreferenced
>> >>> file
>> >>> > "_1.fnm"
>> >>> > IFD [Indexer]: delete "_1.fnm"
>> >>> > IW 1 [Indexer]: now rollback transaction
>> >>> > IW 1 [Indexer]: all running merges have aborted
>> >>> > IFD [Indexer]: now checkpoint "segments_1" [0 segments ; isCommit =
>> >>> false]
>> >>> > IFD [Indexer]: delete "_0.nrm"
>> >>> > IFD [Indexer]: delete "_0.tis"
>> >>> > IFD [Indexer]: delete "_0.fnm"
>> >>> > IFD [Indexer]: delete "_0.tii"
>> >>> > IFD [Indexer]: delete "_0.frq"
>> >>> > IFD [Indexer]: delete "_0.fdx"
>> >>> > IFD [Indexer]: delete "_0.prx"
>> >>> > IFD [Indexer]: delete "_0.fdt"
>> >>> >
>> >>> >
>> >>> > Peter
>> >>> >
>> >>> > On Mon, Oct 26, 2009 at 3:59 PM, Peter Keegan <
>> peterlkee...@gmail.com
>> >>> >wrote:
>> >>> >
>> >>> >>
>> >>> >>
>> >>> >> On Mon, Oct 26, 2009 at 3:00 PM, Michael McCandless <
>> >>> >> luc...@mikemccandless.com> wrote:
>> >>> >>
>> >>> >>> On Mon, Oct 26, 2009 at 2:55 PM, Peter Keegan <
>> peterlkee...@gmail.com
>> >>> >
>> >>> >>> wrote:
>> >>> >>> > On Mon, Oct 26, 2009 at 2:50 PM, Michael McCandless <
>> >>> >>> > luc...@mikemccandless.com> wrote:
>> >>> >>> >
>> >>> >>> >> On Mon, Oct 26, 2009 at 10:44 AM, Peter Keegan <
>> >>> peterlkee...@gmail.com
>> >>> >>> >
>> >>> >>> >> wrote:
>> >>> >>> >> > Even running in console mode, the exception is difficult to
>> >>> >>> interpret.
>> >>> >>> >> > Here's an exception that I think occurred during an add
>> document,
>> >>> >>> commit
>> >>> >>> >> or
>> >>> >>> >> > close:
>> >>> >>> >> > doc counts differ for segment _g: field Reader shows 137 but
>> >>> >>> segmentInfo
>> >>> >>> >> > shows 5777
>> >>> >>> >>
>> >>> >>> >> That's spooky.  Do you have the full exception for this one?
>>  What
>> >>> IO
>> >>> >>> >> system are you running on?  (Is it just a local drive on your
>> >>> windows
>> >>> >>> >> computer?) It's almost as if the IO system is not generating an
>> >>> >>> >> IOException to Java when disk fills up.
>> >>> >>> >>
>> >>> >>> >
>> >>> >>> > Index and code are all on a local drive. There is no other
>> exception
>> >>> >>> coming
>> >>> >>> > back - just what I reported.
>> >>> >>>
>> >>> >>> But, you didn't report a traceback for this first one?
>> >>> >>>
>> >>> >>
>> >>> >> Yes, I need to add some more printStackTrace calls.
>> >>> >>
>> >>> >>
>> >>> >>>
>> >>> >>> >> > I ensured that the disk space was low before updating the
>> index.
>> >>> >>> >>
>> >>> >>> >> You mean, to intentionally test the disk-full case?
>> >>> >>> >>
>> >>> >>> >
>> >>> >>> > Yes, that's right.
>> >>> >>>
>> >>> >>> OK.  Can you turn on IndexWriter's infoStream, get this disk full
>> /
>> >>> >>> corruption to happen again, and post back the resulting output?
>>  Make
>> >>> >>> sure your index first passes CheckIndex before starting (so we
>> don't
>> >>> >>> begin the test w/ any pre-existing index corruption).
>> >>> >>>
>> >>> >>
>> >>> >> Good point about CheckIndex.  I've already found 2 bad ones. I will
>> >>> build
>> >>> >> new indexes from scratch. This will take a while.
>> >>> >>
>> >>> >>
>> >>> >>> >> > On another occasion, the exception was:
>> >>> >>> >> > background merge hit exception: _0:C1080260 _1:C139 _2:C123
>> >>> _3:C107
>> >>> >>> >> _4:C126
>> >>> >>> >> > _5:C121 _6:C126 _7:C711 _8:C116 into _9 [optimize]
>> >>> [mergeDocStores]
>> >>> >>> >>
>> >>> >>> >> In this case, the SegmentMerger was trying to open this
>> segment,
>> >>> but
>> >>> >>> >> on attempting to read the first int from the fdx (fields index)
>> >>> file
>> >>> >>> >> for one of the segments, it hit EOF.
>> >>> >>> >>
>> >>> >>> >> This is also spooky -- this looks like index corruption, which
>> >>> should
>> >>> >>> >> never happen on hitting disk full.
>> >>> >>> >>
>> >>> >>> >
>> >>> >>> > That's what I thought, too. Could Lucene be catching the
>> IOException
>> >>> and
>> >>> >>> > turning it into a different exception?
>> >>> >>>
>> >>> >>> I think that's unlikely, but I guess possible.  We have "disk
>> full"
>> >>> >>> tests in the unit tests, that throw an IOException at different
>> times.
>> >>> >>>
>> >>> >>> What exact windows version are you using?  The local drive is
>> NTFS?
>> >>> >>>
>> >>> >>
>> >>> >> Windows Server 2003 Enterprise x64 SP2. Local drive is NTFS
>> >>> >>
>> >>> >>
>> >>> >>>
>> >>> >>> Mike
>> >>> >>>
>> >>> >>>
>> ---------------------------------------------------------------------
>> >>> >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >>> >>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >>> >>>
>> >>> >>>
>> >>> >>
>> >>> >
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >>>
>> >>>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>

Reply via email to