So, it's a little clearer. I get the Array-out-of-bounds exception if I'm re-indexing some already indexed documents -- if there are deletions involved. I get the CorruptIndexException if I'm indexing freshly -- no deletions. Here's an example of that (with the latest nightly). I removed the existing index, then reindexed the collection six UpLib docs at a time, till I hit the corruption.
Bill /Library/Java/Home/bin/java -Dcom.parc.uplib.indexing.debugMode=true "-Dcom.parc.uplib.indexing.indexProperties=contents:title:categories$,*:date@:apparent-mime-type*:authors$\sand\s:comment:abstract:email-message-id*:email-guid*:email-subject:email-from-name:email-from-address*:email-attachment-to*:email-thread-index*:email-references$,*:email-in-reply-to$,*:keywords$,*:album:performer:composer:music-genre*:audio-length:accompaniment:paragraph-ids$,*:sha-hash*" -classpath "/local/uplib/share/UpLib-1.7/code/lucene-core-2.3-2007-11-29_02-49-31.jar:/local/uplib/share/UpLib-1.7/code/LuceneIndexing.jar" -Dorg.apache.lucene.writeLockTimeout=20000 com.parc.uplib.indexing.LuceneIndexing "/local/janssen-uplib/index" update /local/janssen-uplib/docs 01113-86-6099-767 01113-86-5485-936 01113-86-0975-795 01113-62-2881-882 01113-44-7730-580 01113-44-7684-477 thr002: acquiring lock: LuceneIndex... thr002: acquired lock: LuceneIndex* thr002: releasing lock: LuceneIndex* thr002: indexing output is <updating doc_root_dir is /local/janssen-uplib/docs index file is /local/janssen-uplib/index and it exists. Deleted 0 existing instances of 01113-86-6099-767 Deleted 0 existing instances of 01113-86-5485-936 Deleted 0 existing instances of 01113-86-0975-795 Deleted 0 existing instances of 01113-62-2881-882 Deleted 0 existing instances of 01113-44-7730-580 Deleted 0 existing instances of 01113-44-7684-477 IFD [main]: setInfoStream [EMAIL PROTECTED] IW 0 [main]: setInfoStream: dir=org.apache.lucene.store.FSDirectory@/local/janssen-uplib/index autoCommit=true [EMAIL PROTECTED] [EMAIL PROTECTED] ramBufferSizeMB=16.0 maxBuffereDocs=-1 maxBuffereDeleteTerms=-1 maxFieldLength=10000 index=_94:c8686 IW 0 [main]: setMaxFieldLength 2147483647 Working on document /local/janssen-uplib/docs/01113-86-6099-767 Adding header 'abstract' IT to 01113-86-6099-767 Adding header 'apparent-mime-type' I to 01113-86-6099-767 Adding header 'authors' IT to 01113-86-6099-767 Adding header 'categories' I (flowport) to 01113-86-6099-767 Adding header 'categories' I (article) to 01113-86-6099-767 Adding header 'categories' I (flowport) to 01113-86-6099-767 Adding header 'categories' I (flowport) to 01113-86-6099-767 Adding header 'citation' I to 01113-86-6099-767 Adding header 'date' I (20030800) to 01113-86-6099-767 Adding header 'sha-hash' I to 01113-86-6099-767 Adding header 'title' IT (The Great Chive Division) to 01113-86-6099-767 Created empty doc Document<stored/uncompressed,indexed<id:01113-86-6099-767> stored/uncompressed,indexed<uplibdate:20050418> stored/uncompressed,indexed<uplibtype:whole>> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (4219): Question: My chives have grown Using charset utf8 for contents.txt Using language en for contents.txt Added 01113-86-6099-767 (2 versions) Working on document /local/janssen-uplib/docs/01113-86-5485-936 Adding header 'abstract' IT to 01113-86-5485-936 Adding header 'apparent-mime-type' I to 01113-86-5485-936 Adding header 'authors' IT to 01113-86-5485-936 Adding header 'categories' I (paper) to 01113-86-5485-936 Adding header 'categories' I (sensepad) to 01113-86-5485-936 Adding header 'citation' I to 01113-86-5485-936 Adding header 'date' I (20040524) to 01113-86-5485-936 Adding header 'sha-hash' I to 01113-86-5485-936 Adding header 'title' IT (Designing Interaction, not Interfaces) to 01113-86-5485-936 Created empty doc Document<stored/uncompressed,indexed<id:01113-86-5485-936> stored/uncompressed,indexed<uplibdate:20050418> stored/uncompressed,indexed<uplibtype:whole>> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (3855): Designing Interaction, not Int page 1 (5688): Figure 1. Interaction as a phe page 2 (5831): Interaction models can be eval page 3 (5770): Reification turns concepts and page 4 (5558): Figure 6. A mock-up of the DPI page 5 (5963): In joint work with Yves Guiard page 6 (6819): I propose making interactions page 7 (5622): Graphical Application. Proc. A Using charset utf8 for contents.txt Using language en for contents.txt Added 01113-86-5485-936 (9 versions) Working on document /local/janssen-uplib/docs/01113-86-0975-795 Adding header 'apparent-mime-type' I to 01113-86-0975-795 Adding header 'categories' I (article) to 01113-86-0975-795 Adding header 'date' I (20050414) to 01113-86-0975-795 Adding header 'sha-hash' I to 01113-86-0975-795 Adding header 'source' IT to 01113-86-0975-795 Created empty doc Document<stored/uncompressed,indexed<id:01113-86-0975-795> stored/uncompressed,indexed<uplibdate:20050418> stored/uncompressed,indexed<uplibtype:whole>> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (1851): About sponsorship Simplifying page 1 (2900): Latvia and Lithuania, Estonia' page 2 (3088): How much fairness is gained fo page 3 (5317): At the time of its reform, Est page 4 (1101): In part, the tax system is bur Using charset utf8 for contents.txt Using language en for contents.txt Added 01113-86-0975-795 (6 versions) Working on document /local/janssen-uplib/docs/01113-62-2881-882 Adding header 'apparent-mime-type' I to 01113-62-2881-882 Adding header 'categories' I (article) to 01113-62-2881-882 Adding header 'date' I (20050328) to 01113-62-2881-882 Adding header 'keywords' I (neuroeconomics) to 01113-62-2881-882 Adding header 'sha-hash' I to 01113-62-2881-882 Adding header 'title' IT (Neuroeconomics: Why Logic Often Takes a Backseat) to 01113-62-2881-882 Created empty doc Document<stored/uncompressed,indexed<id:01113-62-2881-882> stored/uncompressed,indexed<uplibdate:20050415> stored/uncompressed,indexed<uplibtype:whole>> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (2957): Close Window MARCH 28, 2005 EC page 1 (3856): these attacks on rationality ? page 2 (484): Even believers in neuroeconomi Using charset utf8 for contents.txt Using language en for contents.txt Added 01113-62-2881-882 (4 versions) Working on document /local/janssen-uplib/docs/01113-44-7730-580 Adding header 'apparent-mime-type' I to 01113-44-7730-580 Adding header 'categories' I (flowport) to 01113-44-7730-580 Adding header 'categories' I (receipt) to 01113-44-7730-580 Adding header 'categories' I (flowport) to 01113-44-7730-580 Adding header 'categories' I (flowport) to 01113-44-7730-580 Adding header 'comment' IT to 01113-44-7730-580 Adding header 'sha-hash' I to 01113-44-7730-580 Adding header 'title' IT (fax receipt for JCDL 2005 demo submission document) to 01113-44-7730-580 Created empty doc Document<stored/uncompressed,indexed<id:01113-44-7730-580> stored/uncompressed,indexed<uplibdate:20050413> stored/uncompressed,indexed<uplibtype:whole>> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (3383): Transmission fleport Dare T lo page 1 (2697): ACM Permission and Release For Using charset utf8 for contents.txt Using language en for contents.txt Added 01113-44-7730-580 (3 versions) Working on document /local/janssen-uplib/docs/01113-44-7684-477 Adding header 'apparent-mime-type' I to 01113-44-7684-477 Adding header 'authors' IT to 01113-44-7684-477 Adding header 'categories' I (flowport) to 01113-44-7684-477 Adding header 'categories' I (article) to 01113-44-7684-477 Adding header 'categories' I (ebook) to 01113-44-7684-477 Adding header 'categories' I (flowport) to 01113-44-7684-477 Adding header 'categories' I (flowport) to 01113-44-7684-477 Adding header 'citation' I to 01113-44-7684-477 Adding header 'date' I (20050117) to 01113-44-7684-477 Adding header 'sha-hash' I to 01113-44-7684-477 Adding header 'title' IT (Signs of Life for E-books in 2004) to 01113-44-7684-477 Created empty doc Document<stored/uncompressed,indexed<id:01113-44-7684-477> stored/uncompressed,indexed<uplibdate:20050413> stored/uncompressed,indexed<uplibtype:whole>> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (6514): Signs ofLife for E-books in 20 Using charset utf8 for contents.txt Using language en for contents.txt Added 01113-44-7684-477 (2 versions) Optimizing... IW 0 [main]: optimize: index now _94:c8686 IW 0 [main]: flush: segment=_95 docStoreSegment=_95 docStoreOffset=0 flushDocs=true flushDeletes=false flushDocStores=true numDocs=26 numBufDelTerms=0 IW 0 [main]: index before flush _94:c8686 closeDocStore: 2 files to flush to segment _95 flush postings as segment _95 numDocs=26 oldRAMSize=269104 newFlushedSize=114988 docs/MB=237.094 new/old=42.73% IW 0 [main]: checkpoint: wrote segments file "segments_ic" IFD [main]: now checkpoint "segments_ic" [2 segments ; isCommit = true] IFD [main]: deleteCommits: now remove commit "segments_ib" IFD [main]: delete "segments_ib" IW 0 [main]: checkpoint: wrote segments file "segments_id" IFD [main]: now checkpoint "segments_id" [2 segments ; isCommit = true] IFD [main]: deleteCommits: now remove commit "segments_ic" IFD [main]: delete "_95.fnm" IFD [main]: delete "_95.frq" IFD [main]: delete "_95.prx" IFD [main]: delete "_95.tis" IFD [main]: delete "_95.tii" IFD [main]: delete "_95.nrm" IFD [main]: delete "_95.fdx" IFD [main]: delete "_95.fdt" IFD [main]: delete "segments_ic" IW 0 [main]: LMP: findMerges: 2 segments IW 0 [main]: LMP: level 6.4501038 to 7.2001038: 1 segments IW 0 [main]: LMP: level -1.0 to 5.070234: 1 segments IW 0 [main]: CMS: now merge IW 0 [main]: CMS: index: _94:c8686 _95:c26 IW 0 [main]: CMS: no more merges pending; now return IW 0 [main]: add merge to pendingMerges: _94:c8686 _95:c26 [optimize] [total 1 pending] IW 0 [main]: CMS: now merge IW 0 [main]: CMS: index: _94:c8686 _95:c26 IW 0 [main]: CMS: consider merge _94:c8686 _95:c26 into _96 [optimize] IW 0 [main]: CMS: launch new thread [Thread-0] IW 0 [Thread-0]: CMS: merge thread: start IW 0 [main]: CMS: no more merges pending; now return IW 0 [Thread-0]: now merge merge=_94:c8686 _95:c26 into _96 [optimize] index=_94:c8686 _95:c26 IW 0 [Thread-0]: merging _94:c8686 _95:c26 into _96 [optimize] IW 0 [Thread-0]: merge: total 8712 docs IW 0 [Thread-0]: hit exception during merge; now refresh deleter on segment _96 IFD [Thread-0]: refresh [prefix=_96]: removing newly created unreferenced file "_96.fdt" IFD [Thread-0]: delete "_96.fdt" IFD [Thread-0]: refresh [prefix=_96]: removing newly created unreferenced file "_96.fdx" IFD [Thread-0]: delete "_96.fdx" IFD [Thread-0]: refresh [prefix=_96]: removing newly created unreferenced file "_96.fnm" IFD [Thread-0]: delete "_96.fnm" IFD [Thread-0]: refresh [prefix=_96]: removing newly created unreferenced file "_96.frq" IFD [Thread-0]: delete "_96.frq" IFD [Thread-0]: refresh [prefix=_96]: removing newly created unreferenced file "_96.prx" IFD [Thread-0]: delete "_96.prx" IFD [Thread-0]: refresh [prefix=_96]: removing newly created unreferenced file "_96.tii" IFD [Thread-0]: delete "_96.tii" IFD [Thread-0]: refresh [prefix=_96]: removing newly created unreferenced file "_96.tis" IFD [Thread-0]: delete "_96.tis" IW 0 [Thread-0]: hit exception during merge java.io.IOException: background merge hit exception: _94:c8686 _95:c26 into _96 [optimize] at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1705) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1654) at com.parc.uplib.indexing.LuceneIndexing.update(LuceneIndexing.java:414) at com.parc.uplib.indexing.LuceneIndexing.main(LuceneIndexing.java:659) Caused by: org.apache.lucene.index.CorruptIndexException: docs out of order (8692 <= 10221 ) at org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:474) at org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:430) at org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:402) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:366) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:123) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3002) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2751) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:240) Exception in thread "Thread-0" org.apache.lucene.index.MergePolicy$MergeException: org.apache.lucene.index.CorruptIndexException: docs out of order (8692 <= 10221 ) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:274) Caused by: org.apache.lucene.index.CorruptIndexException: docs out of order (8692 <= 10221 ) at org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:474) at org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:430) at org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:402) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:366) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:123) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3002) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2751) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:240) > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]