[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2010-09-24 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914401#action_12914401
 ] 

Michael McCandless commented on LUCENE-2666:


This looks like index corruption -- somehow the deleted docs bit vector is too 
small for that segment.  We have to get to the root cause of how the corruption 
happened.

EG if you can enable IndexWriter's infoStream, then get the corruption to 
happen, and post the resulting log...

Also, try enabling assertions... it may catch the corruption sooner.

Can you describe how you use Lucene?  Do you do any direct file IO in the index 
dir?  (eg, for backup/restore or something).

Are you certain only one writer is open on the index?  (Do you disable Lucene's 
locking?)

Which OS, filesystem, java impl are you using?

> ArrayIndexOutOfBoundsException when iterating over TermDocs
> ---
>
> Key: LUCENE-2666
> URL: https://issues.apache.org/jira/browse/LUCENE-2666
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.2
>Reporter: Shay Banon
>
> A user got this very strange exception, and I managed to get the index that 
> it happens on. Basically, iterating over the TermDocs causes an AAOIB 
> exception. I easily reproduced it using the FieldCache which does exactly 
> that (the field in question is indexed as numeric). Here is the exception:
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
>   at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
>   at 
> org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
>   at TestMe.main(TestMe.java:56)
> It happens on the following segment: _26t docCount: 914 delCount: 1 
> delFileName: _26t_1.del
> And as you can see, it smells like a corner case (it fails for document 
> number 912, the AIOOB happens from the deleted docs). The code to recreate it 
> is simple:
> FSDirectory dir = FSDirectory.open(new File("index"));
> IndexReader reader = IndexReader.open(dir, true);
> IndexReader[] subReaders = reader.getSequentialSubReaders();
> for (IndexReader subReader : subReaders) {
> Field field = 
> subReader.getClass().getSuperclass().getDeclaredField("si");
> field.setAccessible(true);
> SegmentInfo si = (SegmentInfo) field.get(subReader);
> System.out.println("--> " + si);
> if (si.getDocStoreSegment().contains("_26t")) {
> // this is the probleatic one...
> System.out.println("problematic one...");
> FieldCache.DEFAULT.getLongs(subReader, "__documentdate", 
> FieldCache.NUMERIC_UTILS_LONG_PARSER);
> }
> }
> Here is the result of a check index on that segment:
>   8 of 10: name=_26t docCount=914
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=1.641
> diagnostics = {optimize=false, mergeFactor=10, 
> os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
> lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, 
> os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
> has deletions [delFileName=_26t_1.del]
> test: open reader.OK [1 deleted docs]
> test: fields..OK [32 fields]
> test: field norms.OK [32 fields]
> test: terms, freq, prox...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
>   at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: stored fields...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
>   at 
> org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-01-13 Thread Nick Pellow (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981649#action_12981649
 ] 

Nick Pellow commented on LUCENE-2666:
-

Hi, 

I am getting this issue as well?
We are doing quite a lot of update updates during indexing. Could this be 
causing the problem ?

This seems to only have happened when we deployed to our linux test server - it 
didn't appear to occur on MAC OS X during development - with the same data set.

Does this only affect Lucene 3.0.2 ? Would a rollback be a good work around ? 



> ArrayIndexOutOfBoundsException when iterating over TermDocs
> ---
>
> Key: LUCENE-2666
> URL: https://issues.apache.org/jira/browse/LUCENE-2666
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.2
>Reporter: Shay Banon
>
> A user got this very strange exception, and I managed to get the index that 
> it happens on. Basically, iterating over the TermDocs causes an AAOIB 
> exception. I easily reproduced it using the FieldCache which does exactly 
> that (the field in question is indexed as numeric). Here is the exception:
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
>   at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
>   at 
> org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
>   at TestMe.main(TestMe.java:56)
> It happens on the following segment: _26t docCount: 914 delCount: 1 
> delFileName: _26t_1.del
> And as you can see, it smells like a corner case (it fails for document 
> number 912, the AIOOB happens from the deleted docs). The code to recreate it 
> is simple:
> FSDirectory dir = FSDirectory.open(new File("index"));
> IndexReader reader = IndexReader.open(dir, true);
> IndexReader[] subReaders = reader.getSequentialSubReaders();
> for (IndexReader subReader : subReaders) {
> Field field = 
> subReader.getClass().getSuperclass().getDeclaredField("si");
> field.setAccessible(true);
> SegmentInfo si = (SegmentInfo) field.get(subReader);
> System.out.println("--> " + si);
> if (si.getDocStoreSegment().contains("_26t")) {
> // this is the probleatic one...
> System.out.println("problematic one...");
> FieldCache.DEFAULT.getLongs(subReader, "__documentdate", 
> FieldCache.NUMERIC_UTILS_LONG_PARSER);
> }
> }
> Here is the result of a check index on that segment:
>   8 of 10: name=_26t docCount=914
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=1.641
> diagnostics = {optimize=false, mergeFactor=10, 
> os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
> lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, 
> os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
> has deletions [delFileName=_26t_1.del]
> test: open reader.OK [1 deleted docs]
> test: fields..OK [32 fields]
> test: field norms.OK [32 fields]
> test: terms, freq, prox...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
>   at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: stored fields...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
>   at 
> org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: term vectorsERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-01-13 Thread Nick Pellow (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981650#action_12981650
 ] 

Nick Pellow commented on LUCENE-2666:
-

I've also noticed this occurring since I started using a numeric field and 
accessing the its field cache for boosting.

> ArrayIndexOutOfBoundsException when iterating over TermDocs
> ---
>
> Key: LUCENE-2666
> URL: https://issues.apache.org/jira/browse/LUCENE-2666
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.2
>Reporter: Shay Banon
>
> A user got this very strange exception, and I managed to get the index that 
> it happens on. Basically, iterating over the TermDocs causes an AAOIB 
> exception. I easily reproduced it using the FieldCache which does exactly 
> that (the field in question is indexed as numeric). Here is the exception:
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
>   at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
>   at 
> org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
>   at TestMe.main(TestMe.java:56)
> It happens on the following segment: _26t docCount: 914 delCount: 1 
> delFileName: _26t_1.del
> And as you can see, it smells like a corner case (it fails for document 
> number 912, the AIOOB happens from the deleted docs). The code to recreate it 
> is simple:
> FSDirectory dir = FSDirectory.open(new File("index"));
> IndexReader reader = IndexReader.open(dir, true);
> IndexReader[] subReaders = reader.getSequentialSubReaders();
> for (IndexReader subReader : subReaders) {
> Field field = 
> subReader.getClass().getSuperclass().getDeclaredField("si");
> field.setAccessible(true);
> SegmentInfo si = (SegmentInfo) field.get(subReader);
> System.out.println("--> " + si);
> if (si.getDocStoreSegment().contains("_26t")) {
> // this is the probleatic one...
> System.out.println("problematic one...");
> FieldCache.DEFAULT.getLongs(subReader, "__documentdate", 
> FieldCache.NUMERIC_UTILS_LONG_PARSER);
> }
> }
> Here is the result of a check index on that segment:
>   8 of 10: name=_26t docCount=914
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=1.641
> diagnostics = {optimize=false, mergeFactor=10, 
> os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
> lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, 
> os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
> has deletions [delFileName=_26t_1.del]
> test: open reader.OK [1 deleted docs]
> test: fields..OK [32 fields]
> test: field norms.OK [32 fields]
> test: terms, freq, prox...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
>   at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: stored fields...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
>   at 
> org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: term vectorsERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
>   at 
> org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.java:721)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:515)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   a

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-01-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981843#action_12981843
 ] 

Michael McCandless commented on LUCENE-2666:


Can you run CheckIndex on this index and post the result?  And, enable 
assertions.

And if possible turn on IndexWriter's infoStream and capture/post the output 
leading up to the corruption.

Many updates during indexing is just fine... and I know whether rolling back to 
older Lucene releases will help (until we've isolated the issue).  But: maybe 
try rolling forward to 3.0.3?  It's possible you're hitting a big fixed in 
3.0.3 (though this doesn't ring a bell for me).

> ArrayIndexOutOfBoundsException when iterating over TermDocs
> ---
>
> Key: LUCENE-2666
> URL: https://issues.apache.org/jira/browse/LUCENE-2666
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.2
>Reporter: Shay Banon
>
> A user got this very strange exception, and I managed to get the index that 
> it happens on. Basically, iterating over the TermDocs causes an AAOIB 
> exception. I easily reproduced it using the FieldCache which does exactly 
> that (the field in question is indexed as numeric). Here is the exception:
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
>   at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
>   at 
> org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
>   at TestMe.main(TestMe.java:56)
> It happens on the following segment: _26t docCount: 914 delCount: 1 
> delFileName: _26t_1.del
> And as you can see, it smells like a corner case (it fails for document 
> number 912, the AIOOB happens from the deleted docs). The code to recreate it 
> is simple:
> FSDirectory dir = FSDirectory.open(new File("index"));
> IndexReader reader = IndexReader.open(dir, true);
> IndexReader[] subReaders = reader.getSequentialSubReaders();
> for (IndexReader subReader : subReaders) {
> Field field = 
> subReader.getClass().getSuperclass().getDeclaredField("si");
> field.setAccessible(true);
> SegmentInfo si = (SegmentInfo) field.get(subReader);
> System.out.println("--> " + si);
> if (si.getDocStoreSegment().contains("_26t")) {
> // this is the probleatic one...
> System.out.println("problematic one...");
> FieldCache.DEFAULT.getLongs(subReader, "__documentdate", 
> FieldCache.NUMERIC_UTILS_LONG_PARSER);
> }
> }
> Here is the result of a check index on that segment:
>   8 of 10: name=_26t docCount=914
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=1.641
> diagnostics = {optimize=false, mergeFactor=10, 
> os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
> lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, 
> os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
> has deletions [delFileName=_26t_1.del]
> test: open reader.OK [1 deleted docs]
> test: fields..OK [32 fields]
> test: field norms.OK [32 fields]
> test: terms, freq, prox...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
>   at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: stored fields...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
>   at 
> org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: term vectorsERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-01-16 Thread Nick Pellow (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982468#action_12982468
 ] 

Nick Pellow commented on LUCENE-2666:
-

Hi MIchael, 

Thanks for the update. I have added an infoStream to the writer and triggered a 
re-index. Unfortunately, I didn't see the corruption occur this time.
I am about to deploy to a different environment so will let you know.

We are already upgraded to Lucene 3.0.3, unfortunately.

Hopefully we will see the problem re-occur and be able to capture the necessary 
output to track down the problem.

I've also added a call to writer.prepareCommit(). Previously, only 
writer.commit() was being called. Could that have an effect ?

Cheers,
Nick

> ArrayIndexOutOfBoundsException when iterating over TermDocs
> ---
>
> Key: LUCENE-2666
> URL: https://issues.apache.org/jira/browse/LUCENE-2666
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.2
>Reporter: Shay Banon
>
> A user got this very strange exception, and I managed to get the index that 
> it happens on. Basically, iterating over the TermDocs causes an AAOIB 
> exception. I easily reproduced it using the FieldCache which does exactly 
> that (the field in question is indexed as numeric). Here is the exception:
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
>   at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
>   at 
> org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
>   at TestMe.main(TestMe.java:56)
> It happens on the following segment: _26t docCount: 914 delCount: 1 
> delFileName: _26t_1.del
> And as you can see, it smells like a corner case (it fails for document 
> number 912, the AIOOB happens from the deleted docs). The code to recreate it 
> is simple:
> FSDirectory dir = FSDirectory.open(new File("index"));
> IndexReader reader = IndexReader.open(dir, true);
> IndexReader[] subReaders = reader.getSequentialSubReaders();
> for (IndexReader subReader : subReaders) {
> Field field = 
> subReader.getClass().getSuperclass().getDeclaredField("si");
> field.setAccessible(true);
> SegmentInfo si = (SegmentInfo) field.get(subReader);
> System.out.println("--> " + si);
> if (si.getDocStoreSegment().contains("_26t")) {
> // this is the probleatic one...
> System.out.println("problematic one...");
> FieldCache.DEFAULT.getLongs(subReader, "__documentdate", 
> FieldCache.NUMERIC_UTILS_LONG_PARSER);
> }
> }
> Here is the result of a check index on that segment:
>   8 of 10: name=_26t docCount=914
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=1.641
> diagnostics = {optimize=false, mergeFactor=10, 
> os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
> lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, 
> os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
> has deletions [delFileName=_26t_1.del]
> test: open reader.OK [1 deleted docs]
> test: fields..OK [32 fields]
> test: field norms.OK [32 fields]
> test: terms, freq, prox...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
>   at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: stored fields...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
>   at 
> org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: term vectorsERROR [114]
> java.lang.ArrayIndexOutOfBoundsEx

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-01-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982597#action_12982597
 ] 

Michael McCandless commented on LUCENE-2666:


OK thanks.  Hopefully we can catch this under infoStream's watch.

Not calling prepareCommit is harmless -- IW simply calls it for you under the 
hood when commit() is called, if you hadn't already called prepareCommit().

The two APIs are separate in case you want to involve Lucene in a 2 phased 
commit w/ other resources.

> ArrayIndexOutOfBoundsException when iterating over TermDocs
> ---
>
> Key: LUCENE-2666
> URL: https://issues.apache.org/jira/browse/LUCENE-2666
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.2
>Reporter: Shay Banon
>
> A user got this very strange exception, and I managed to get the index that 
> it happens on. Basically, iterating over the TermDocs causes an AAOIB 
> exception. I easily reproduced it using the FieldCache which does exactly 
> that (the field in question is indexed as numeric). Here is the exception:
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
>   at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
>   at 
> org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
>   at TestMe.main(TestMe.java:56)
> It happens on the following segment: _26t docCount: 914 delCount: 1 
> delFileName: _26t_1.del
> And as you can see, it smells like a corner case (it fails for document 
> number 912, the AIOOB happens from the deleted docs). The code to recreate it 
> is simple:
> FSDirectory dir = FSDirectory.open(new File("index"));
> IndexReader reader = IndexReader.open(dir, true);
> IndexReader[] subReaders = reader.getSequentialSubReaders();
> for (IndexReader subReader : subReaders) {
> Field field = 
> subReader.getClass().getSuperclass().getDeclaredField("si");
> field.setAccessible(true);
> SegmentInfo si = (SegmentInfo) field.get(subReader);
> System.out.println("--> " + si);
> if (si.getDocStoreSegment().contains("_26t")) {
> // this is the probleatic one...
> System.out.println("problematic one...");
> FieldCache.DEFAULT.getLongs(subReader, "__documentdate", 
> FieldCache.NUMERIC_UTILS_LONG_PARSER);
> }
> }
> Here is the result of a check index on that segment:
>   8 of 10: name=_26t docCount=914
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=1.641
> diagnostics = {optimize=false, mergeFactor=10, 
> os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
> lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, 
> os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
> has deletions [delFileName=_26t_1.del]
> test: open reader.OK [1 deleted docs]
> test: fields..OK [32 fields]
> test: field norms.OK [32 fields]
> test: terms, freq, prox...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
>   at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: stored fields...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
>   at 
> org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: term vectorsERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
>   at 
> org.apache.lucene

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-01-23 Thread Nick Pellow (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985474#action_12985474
 ] 

Nick Pellow commented on LUCENE-2666:
-

Hi Michael, 

We managed to catch this happening again. I've created a bug for our project 
over at: http://jira.atlassian.com/browse/CRUC-5486 ( Since I can't seem to 
upload the log to this JIRA instance?).

My hunch is that this occurs if a search is performed at the same time as a 
re-index - and a lucene cache is potentially not being closed/cleared correctly.
It appears that a re-start of the application causes this problem to go away.

Cheers,
Nick.



> ArrayIndexOutOfBoundsException when iterating over TermDocs
> ---
>
> Key: LUCENE-2666
> URL: https://issues.apache.org/jira/browse/LUCENE-2666
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.2
>Reporter: Shay Banon
>
> A user got this very strange exception, and I managed to get the index that 
> it happens on. Basically, iterating over the TermDocs causes an AAOIB 
> exception. I easily reproduced it using the FieldCache which does exactly 
> that (the field in question is indexed as numeric). Here is the exception:
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
>   at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
>   at 
> org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
>   at TestMe.main(TestMe.java:56)
> It happens on the following segment: _26t docCount: 914 delCount: 1 
> delFileName: _26t_1.del
> And as you can see, it smells like a corner case (it fails for document 
> number 912, the AIOOB happens from the deleted docs). The code to recreate it 
> is simple:
> FSDirectory dir = FSDirectory.open(new File("index"));
> IndexReader reader = IndexReader.open(dir, true);
> IndexReader[] subReaders = reader.getSequentialSubReaders();
> for (IndexReader subReader : subReaders) {
> Field field = 
> subReader.getClass().getSuperclass().getDeclaredField("si");
> field.setAccessible(true);
> SegmentInfo si = (SegmentInfo) field.get(subReader);
> System.out.println("--> " + si);
> if (si.getDocStoreSegment().contains("_26t")) {
> // this is the probleatic one...
> System.out.println("problematic one...");
> FieldCache.DEFAULT.getLongs(subReader, "__documentdate", 
> FieldCache.NUMERIC_UTILS_LONG_PARSER);
> }
> }
> Here is the result of a check index on that segment:
>   8 of 10: name=_26t docCount=914
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=1.641
> diagnostics = {optimize=false, mergeFactor=10, 
> os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
> lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, 
> os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
> has deletions [delFileName=_26t_1.del]
> test: open reader.OK [1 deleted docs]
> test: fields..OK [32 fields]
> test: field norms.OK [32 fields]
> test: terms, freq, prox...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
>   at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: stored fields...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
>   at 
> org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: term vectorsERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   a

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-01-24 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985833#action_12985833
 ] 

Michael McCandless commented on LUCENE-2666:


Thanks Nick; I'll look at the log.

Aside: you should be able to attach files here... not sure why you saw 
otherwise...

> ArrayIndexOutOfBoundsException when iterating over TermDocs
> ---
>
> Key: LUCENE-2666
> URL: https://issues.apache.org/jira/browse/LUCENE-2666
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.2
>Reporter: Shay Banon
>
> A user got this very strange exception, and I managed to get the index that 
> it happens on. Basically, iterating over the TermDocs causes an AAOIB 
> exception. I easily reproduced it using the FieldCache which does exactly 
> that (the field in question is indexed as numeric). Here is the exception:
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
>   at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
>   at 
> org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
>   at TestMe.main(TestMe.java:56)
> It happens on the following segment: _26t docCount: 914 delCount: 1 
> delFileName: _26t_1.del
> And as you can see, it smells like a corner case (it fails for document 
> number 912, the AIOOB happens from the deleted docs). The code to recreate it 
> is simple:
> FSDirectory dir = FSDirectory.open(new File("index"));
> IndexReader reader = IndexReader.open(dir, true);
> IndexReader[] subReaders = reader.getSequentialSubReaders();
> for (IndexReader subReader : subReaders) {
> Field field = 
> subReader.getClass().getSuperclass().getDeclaredField("si");
> field.setAccessible(true);
> SegmentInfo si = (SegmentInfo) field.get(subReader);
> System.out.println("--> " + si);
> if (si.getDocStoreSegment().contains("_26t")) {
> // this is the probleatic one...
> System.out.println("problematic one...");
> FieldCache.DEFAULT.getLongs(subReader, "__documentdate", 
> FieldCache.NUMERIC_UTILS_LONG_PARSER);
> }
> }
> Here is the result of a check index on that segment:
>   8 of 10: name=_26t docCount=914
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=1.641
> diagnostics = {optimize=false, mergeFactor=10, 
> os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
> lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, 
> os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
> has deletions [delFileName=_26t_1.del]
> test: open reader.OK [1 deleted docs]
> test: fields..OK [32 fields]
> test: field norms.OK [32 fields]
> test: terms, freq, prox...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
>   at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: stored fields...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
>   at 
> org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: term vectorsERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
>   at 
> org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.java:721)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:515)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-01-24 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985873#action_12985873
 ] 

Michael McCandless commented on LUCENE-2666:


Nick, the infoStream output looks healthy -- I don't see any exceptions.  Can 
you post the output from CheckIndex against the index that corresponds to this 
infoStream?

> ArrayIndexOutOfBoundsException when iterating over TermDocs
> ---
>
> Key: LUCENE-2666
> URL: https://issues.apache.org/jira/browse/LUCENE-2666
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.2
>Reporter: Shay Banon
>
> A user got this very strange exception, and I managed to get the index that 
> it happens on. Basically, iterating over the TermDocs causes an AAOIB 
> exception. I easily reproduced it using the FieldCache which does exactly 
> that (the field in question is indexed as numeric). Here is the exception:
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
>   at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
>   at 
> org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
>   at TestMe.main(TestMe.java:56)
> It happens on the following segment: _26t docCount: 914 delCount: 1 
> delFileName: _26t_1.del
> And as you can see, it smells like a corner case (it fails for document 
> number 912, the AIOOB happens from the deleted docs). The code to recreate it 
> is simple:
> FSDirectory dir = FSDirectory.open(new File("index"));
> IndexReader reader = IndexReader.open(dir, true);
> IndexReader[] subReaders = reader.getSequentialSubReaders();
> for (IndexReader subReader : subReaders) {
> Field field = 
> subReader.getClass().getSuperclass().getDeclaredField("si");
> field.setAccessible(true);
> SegmentInfo si = (SegmentInfo) field.get(subReader);
> System.out.println("--> " + si);
> if (si.getDocStoreSegment().contains("_26t")) {
> // this is the probleatic one...
> System.out.println("problematic one...");
> FieldCache.DEFAULT.getLongs(subReader, "__documentdate", 
> FieldCache.NUMERIC_UTILS_LONG_PARSER);
> }
> }
> Here is the result of a check index on that segment:
>   8 of 10: name=_26t docCount=914
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=1.641
> diagnostics = {optimize=false, mergeFactor=10, 
> os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
> lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, 
> os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
> has deletions [delFileName=_26t_1.del]
> test: open reader.OK [1 deleted docs]
> test: fields..OK [32 fields]
> test: field norms.OK [32 fields]
> test: terms, freq, prox...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
>   at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: stored fields...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
>   at 
> org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: term vectorsERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
>   at 
> org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.java:721)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:515)
>   at org.apache.

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-01-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986566#action_12986566
 ] 

Michael McCandless commented on LUCENE-2666:


Hmmm --- given that exception, I would expect CheckIndex to have also seen this 
issue.

Searching at the same time as indexing shouldn't cause this.  Lucene doesn't 
cache postings, but does cache metadata for the term, though I can't see how 
that could lead to this exception.

This could also be a hardware issue?  Do you see the problem on more than one 
machine?

> ArrayIndexOutOfBoundsException when iterating over TermDocs
> ---
>
> Key: LUCENE-2666
> URL: https://issues.apache.org/jira/browse/LUCENE-2666
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.2
>Reporter: Shay Banon
> Attachments: checkindex-out.txt
>
>
> A user got this very strange exception, and I managed to get the index that 
> it happens on. Basically, iterating over the TermDocs causes an AAOIB 
> exception. I easily reproduced it using the FieldCache which does exactly 
> that (the field in question is indexed as numeric). Here is the exception:
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
>   at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
>   at 
> org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
>   at TestMe.main(TestMe.java:56)
> It happens on the following segment: _26t docCount: 914 delCount: 1 
> delFileName: _26t_1.del
> And as you can see, it smells like a corner case (it fails for document 
> number 912, the AIOOB happens from the deleted docs). The code to recreate it 
> is simple:
> FSDirectory dir = FSDirectory.open(new File("index"));
> IndexReader reader = IndexReader.open(dir, true);
> IndexReader[] subReaders = reader.getSequentialSubReaders();
> for (IndexReader subReader : subReaders) {
> Field field = 
> subReader.getClass().getSuperclass().getDeclaredField("si");
> field.setAccessible(true);
> SegmentInfo si = (SegmentInfo) field.get(subReader);
> System.out.println("--> " + si);
> if (si.getDocStoreSegment().contains("_26t")) {
> // this is the probleatic one...
> System.out.println("problematic one...");
> FieldCache.DEFAULT.getLongs(subReader, "__documentdate", 
> FieldCache.NUMERIC_UTILS_LONG_PARSER);
> }
> }
> Here is the result of a check index on that segment:
>   8 of 10: name=_26t docCount=914
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=1.641
> diagnostics = {optimize=false, mergeFactor=10, 
> os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
> lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, 
> os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
> has deletions [delFileName=_26t_1.del]
> test: open reader.OK [1 deleted docs]
> test: fields..OK [32 fields]
> test: field norms.OK [32 fields]
> test: terms, freq, prox...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
>   at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: stored fields...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
>   at 
> org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: term vectorsERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnl

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-01-26 Thread Nick Pellow (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987328#action_12987328
 ] 

Nick Pellow commented on LUCENE-2666:
-

Hi MIchael, 

We have now seen this issue on more than 1 machine. I don't think it is a 
hardware issue.
We are using the ConcurrentMergeScheduler on the writer - so not sure if that 
has known issues?
A restart definitely 'fixes' this problem though.

The stack-trace is:

{code}
java.lang.ArrayIndexOutOfBoundsException: 3740
at org.apache.lucene.util.BitVector.get(BitVector.java:104)
at 
org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
at 
org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
at 
org.apache.lucene.search.PhrasePositions.next(PhrasePositions.java:41)
at org.apache.lucene.search.PhraseScorer.init(PhraseScorer.java:147)
at org.apache.lucene.search.PhraseScorer.nextDoc(PhraseScorer.java:78)
at 
org.apache.lucene.search.DisjunctionSumScorer.initScorerDocQueue(DisjunctionSumScorer.java:101)
at 
org.apache.lucene.search.DisjunctionSumScorer.(DisjunctionSumScorer.java:85)
at 
org.apache.lucene.search.BooleanScorer2$1.(BooleanScorer2.java:154)
at 
org.apache.lucene.search.BooleanScorer2.countingDisjunctionSumScorer(BooleanScorer2.java:149)
at 
org.apache.lucene.search.BooleanScorer2.makeCountingSumScorerNoReq(BooleanScorer2.java:218)
at 
org.apache.lucene.search.BooleanScorer2.makeCountingSumScorer(BooleanScorer2.java:208)
at 
org.apache.lucene.search.BooleanScorer2.(BooleanScorer2.java:101)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:336)
at 
org.apache.lucene.search.function.CustomScoreQuery$CustomWeight.scorer(CustomScoreQuery.java:359)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:306)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:210)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:170)
at 
org.apache.lucene.search.MultiSearcher$MultiSearcherCallableNoSort.call(MultiSearcher.java:363)
at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:208)
at org.apache.lucene.search.Searcher.search(Searcher.java:98)
{code}

I am going to spend some time trying to reproduce this locally today, with a 
debugger attached.

Cheers,
Nick

> ArrayIndexOutOfBoundsException when iterating over TermDocs
> ---
>
> Key: LUCENE-2666
> URL: https://issues.apache.org/jira/browse/LUCENE-2666
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.2
>Reporter: Shay Banon
> Attachments: checkindex-out.txt
>
>
> A user got this very strange exception, and I managed to get the index that 
> it happens on. Basically, iterating over the TermDocs causes an AAOIB 
> exception. I easily reproduced it using the FieldCache which does exactly 
> that (the field in question is indexed as numeric). Here is the exception:
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
>   at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
>   at 
> org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
>   at TestMe.main(TestMe.java:56)
> It happens on the following segment: _26t docCount: 914 delCount: 1 
> delFileName: _26t_1.del
> And as you can see, it smells like a corner case (it fails for document 
> number 912, the AIOOB happens from the deleted docs). The code to recreate it 
> is simple:
> FSDirectory dir = FSDirectory.open(new File("index"));
> IndexReader reader = IndexReader.open(dir, true);
> IndexReader[] subReaders = reader.getSequentialSubReaders();
> for (IndexReader subReader : subReaders) {
> Field field = 
> subReader.getClass().getSuperclass().getDeclaredField("si");
> field.setAccessible(true);
> SegmentInfo si = (SegmentInfo) field.get(subReader);
> System.out.println("--> " + si);
> if (si.getDocStoreSegment().contains("_26t")) {
> // this is the probleatic one...
> System.out.println("problematic one...");
> FieldCache.DEFAULT.getLongs(subReader, "__documentdate", 
> FieldCache.NUMERIC_UTILS_LONG_PARSER);
> }
> }
> Here is the result of a check index on that segment:
>   8 of 10

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-01-26 Thread Nick Pellow (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987415#action_12987415
 ] 

Nick Pellow commented on LUCENE-2666:
-

Hi Michael, 

We've done some analysis on how we are using Lucene and discovered the 
following:
* the *only* time we construct a new reader {{IndexReader.open(directory, 
true)}} is when we search the index for the first time since the server start.
* every other time, we are using reader.reopen() each time we detect that a 
write has occurred to the index.
{code}
final IndexReader newReader = oldReader.reopen();
if (newReader != oldReader) {
oldReader.decRef();
reader = newReader;
}
{code}
* the bug definitely goes away when the system is restarted and a new Reader is 
instantiated.
* once we see the AIOOBE, it happens on _every search_ until we restart
* running CheckIndex never reports any errors

Therefore we believe that reader.reopen() is most likely causing certain data 
structures to be shared and creates inconsistency which leads to this exception.

The latest stack trace we are getting is in the comment above.

Given this information would you have any more clues for us?

Thank you very much for your help so far,
greatly appreciated.
Nick




> ArrayIndexOutOfBoundsException when iterating over TermDocs
> ---
>
> Key: LUCENE-2666
> URL: https://issues.apache.org/jira/browse/LUCENE-2666
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.2
>Reporter: Shay Banon
> Attachments: checkindex-out.txt
>
>
> A user got this very strange exception, and I managed to get the index that 
> it happens on. Basically, iterating over the TermDocs causes an AAOIB 
> exception. I easily reproduced it using the FieldCache which does exactly 
> that (the field in question is indexed as numeric). Here is the exception:
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
>   at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
>   at 
> org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
>   at TestMe.main(TestMe.java:56)
> It happens on the following segment: _26t docCount: 914 delCount: 1 
> delFileName: _26t_1.del
> And as you can see, it smells like a corner case (it fails for document 
> number 912, the AIOOB happens from the deleted docs). The code to recreate it 
> is simple:
> FSDirectory dir = FSDirectory.open(new File("index"));
> IndexReader reader = IndexReader.open(dir, true);
> IndexReader[] subReaders = reader.getSequentialSubReaders();
> for (IndexReader subReader : subReaders) {
> Field field = 
> subReader.getClass().getSuperclass().getDeclaredField("si");
> field.setAccessible(true);
> SegmentInfo si = (SegmentInfo) field.get(subReader);
> System.out.println("--> " + si);
> if (si.getDocStoreSegment().contains("_26t")) {
> // this is the probleatic one...
> System.out.println("problematic one...");
> FieldCache.DEFAULT.getLongs(subReader, "__documentdate", 
> FieldCache.NUMERIC_UTILS_LONG_PARSER);
> }
> }
> Here is the result of a check index on that segment:
>   8 of 10: name=_26t docCount=914
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=1.641
> diagnostics = {optimize=false, mergeFactor=10, 
> os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
> lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, 
> os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
> has deletions [delFileName=_26t_1.del]
> test: open reader.OK [1 deleted docs]
> test: fields..OK [32 fields]
> test: field norms.OK [32 fields]
> test: terms, freq, prox...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
>   at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
>   at org.apache.lucene.in

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-01-27 Thread Nick Pellow (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987829#action_12987829
 ] 

Nick Pellow commented on LUCENE-2666:
-

Hi Michael, 

We have a memory dump of the instance that is affected by this. Would you know 
the best place to start looking for the possibly outdated BitVector?
We could make this available to you if you wish - all 1.8GB of it though :( 

Cheers,
Nick



> ArrayIndexOutOfBoundsException when iterating over TermDocs
> ---
>
> Key: LUCENE-2666
> URL: https://issues.apache.org/jira/browse/LUCENE-2666
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.2
>Reporter: Shay Banon
> Attachments: checkindex-out.txt
>
>
> A user got this very strange exception, and I managed to get the index that 
> it happens on. Basically, iterating over the TermDocs causes an AAOIB 
> exception. I easily reproduced it using the FieldCache which does exactly 
> that (the field in question is indexed as numeric). Here is the exception:
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
>   at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
>   at 
> org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
>   at TestMe.main(TestMe.java:56)
> It happens on the following segment: _26t docCount: 914 delCount: 1 
> delFileName: _26t_1.del
> And as you can see, it smells like a corner case (it fails for document 
> number 912, the AIOOB happens from the deleted docs). The code to recreate it 
> is simple:
> FSDirectory dir = FSDirectory.open(new File("index"));
> IndexReader reader = IndexReader.open(dir, true);
> IndexReader[] subReaders = reader.getSequentialSubReaders();
> for (IndexReader subReader : subReaders) {
> Field field = 
> subReader.getClass().getSuperclass().getDeclaredField("si");
> field.setAccessible(true);
> SegmentInfo si = (SegmentInfo) field.get(subReader);
> System.out.println("--> " + si);
> if (si.getDocStoreSegment().contains("_26t")) {
> // this is the probleatic one...
> System.out.println("problematic one...");
> FieldCache.DEFAULT.getLongs(subReader, "__documentdate", 
> FieldCache.NUMERIC_UTILS_LONG_PARSER);
> }
> }
> Here is the result of a check index on that segment:
>   8 of 10: name=_26t docCount=914
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=1.641
> diagnostics = {optimize=false, mergeFactor=10, 
> os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
> lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, 
> os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
> has deletions [delFileName=_26t_1.del]
> test: open reader.OK [1 deleted docs]
> test: fields..OK [32 fields]
> test: field norms.OK [32 fields]
> test: terms, freq, prox...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
>   at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: stored fields...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
>   at 
> org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: term vectorsERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
>   at 
> org.apache.lucene.index.CheckIndex.testTermVectors(CheckI

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-01-28 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988086#action_12988086
 ] 

Michael McCandless commented on LUCENE-2666:


Nick, are you running Lucene w/ asserts enabled?  Are you able to take a src 
patch and run it through your test?  If so, I can add some verbosity/asserts 
and we can try to narrow this down.

It does  sound like somehow the wrong delete BitVector is getting associated w/ 
a SegmentReader.

It looks like you don't use NRT readers right?  Ie, you always .commit() from 
IW and then do IR.reopen?

> ArrayIndexOutOfBoundsException when iterating over TermDocs
> ---
>
> Key: LUCENE-2666
> URL: https://issues.apache.org/jira/browse/LUCENE-2666
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.2
>Reporter: Shay Banon
> Attachments: checkindex-out.txt
>
>
> A user got this very strange exception, and I managed to get the index that 
> it happens on. Basically, iterating over the TermDocs causes an AAOIB 
> exception. I easily reproduced it using the FieldCache which does exactly 
> that (the field in question is indexed as numeric). Here is the exception:
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
>   at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
>   at 
> org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
>   at TestMe.main(TestMe.java:56)
> It happens on the following segment: _26t docCount: 914 delCount: 1 
> delFileName: _26t_1.del
> And as you can see, it smells like a corner case (it fails for document 
> number 912, the AIOOB happens from the deleted docs). The code to recreate it 
> is simple:
> FSDirectory dir = FSDirectory.open(new File("index"));
> IndexReader reader = IndexReader.open(dir, true);
> IndexReader[] subReaders = reader.getSequentialSubReaders();
> for (IndexReader subReader : subReaders) {
> Field field = 
> subReader.getClass().getSuperclass().getDeclaredField("si");
> field.setAccessible(true);
> SegmentInfo si = (SegmentInfo) field.get(subReader);
> System.out.println("--> " + si);
> if (si.getDocStoreSegment().contains("_26t")) {
> // this is the probleatic one...
> System.out.println("problematic one...");
> FieldCache.DEFAULT.getLongs(subReader, "__documentdate", 
> FieldCache.NUMERIC_UTILS_LONG_PARSER);
> }
> }
> Here is the result of a check index on that segment:
>   8 of 10: name=_26t docCount=914
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=1.641
> diagnostics = {optimize=false, mergeFactor=10, 
> os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
> lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, 
> os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
> has deletions [delFileName=_26t_1.del]
> test: open reader.OK [1 deleted docs]
> test: fields..OK [32 fields]
> test: field norms.OK [32 fields]
> test: terms, freq, prox...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
>   at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: stored fields...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
>   at 
> org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: term vectorsERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-02-07 Thread Nick Pellow (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991315#comment-12991315
 ] 

Nick Pellow commented on LUCENE-2666:
-

Hi Michael, 

This issue was entirely a problem with our code, and I doubt Lucene could have 
done a better job.

The problem was that on upgrade of the index (done when fields have changed 
etc), we recreate the index in the same location using 
{{IndexWriter.create(directory, analyzer, true, MAX_FIELD_LENGTH)}}.

Some code was added just before this however, that deleted every single file in 
the directory. This meant that some other thread performing a search could have 
seen a corrupt index, thus causing the AIOOBE. The developer was paranoid that 
IndexWriter.create was leaving old files lying around.

I'm glad we got to the bottom of this, and very much so that it was not a bug 
in Lucene!

Thanks again for helping us track this down.

Best Regards,
Nick Pellow


> ArrayIndexOutOfBoundsException when iterating over TermDocs
> ---
>
> Key: LUCENE-2666
> URL: https://issues.apache.org/jira/browse/LUCENE-2666
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.2
>Reporter: Shay Banon
> Attachments: checkindex-out.txt
>
>
> A user got this very strange exception, and I managed to get the index that 
> it happens on. Basically, iterating over the TermDocs causes an AAOIB 
> exception. I easily reproduced it using the FieldCache which does exactly 
> that (the field in question is indexed as numeric). Here is the exception:
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
>   at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
>   at 
> org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
>   at TestMe.main(TestMe.java:56)
> It happens on the following segment: _26t docCount: 914 delCount: 1 
> delFileName: _26t_1.del
> And as you can see, it smells like a corner case (it fails for document 
> number 912, the AIOOB happens from the deleted docs). The code to recreate it 
> is simple:
> FSDirectory dir = FSDirectory.open(new File("index"));
> IndexReader reader = IndexReader.open(dir, true);
> IndexReader[] subReaders = reader.getSequentialSubReaders();
> for (IndexReader subReader : subReaders) {
> Field field = 
> subReader.getClass().getSuperclass().getDeclaredField("si");
> field.setAccessible(true);
> SegmentInfo si = (SegmentInfo) field.get(subReader);
> System.out.println("--> " + si);
> if (si.getDocStoreSegment().contains("_26t")) {
> // this is the probleatic one...
> System.out.println("problematic one...");
> FieldCache.DEFAULT.getLongs(subReader, "__documentdate", 
> FieldCache.NUMERIC_UTILS_LONG_PARSER);
> }
> }
> Here is the result of a check index on that segment:
>   8 of 10: name=_26t docCount=914
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=1.641
> diagnostics = {optimize=false, mergeFactor=10, 
> os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
> lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, 
> os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
> has deletions [delFileName=_26t_1.del]
> test: open reader.OK [1 deleted docs]
> test: fields..OK [32 fields]
> test: field norms.OK [32 fields]
> test: terms, freq, prox...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
>   at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: stored fields...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
>   at 
> org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
>  

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-02-07 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991545#comment-12991545
 ] 

Michael McCandless commented on LUCENE-2666:


Ahh, thanks for bringing closure Nick!  Although, I'm a little confused how 
removing files from the index while readers are using it, could lead to those 
exceptions...

Note that it's perfectly fine to pass create=true to IW, over an existing 
index, even while readers are using it; IW will gracefully remove the old files 
itself, even if open IRs are still using them. IW just makes a new commit point 
that drops all references to prior segments...


> ArrayIndexOutOfBoundsException when iterating over TermDocs
> ---
>
> Key: LUCENE-2666
> URL: https://issues.apache.org/jira/browse/LUCENE-2666
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.2
>Reporter: Shay Banon
> Attachments: checkindex-out.txt
>
>
> A user got this very strange exception, and I managed to get the index that 
> it happens on. Basically, iterating over the TermDocs causes an AAOIB 
> exception. I easily reproduced it using the FieldCache which does exactly 
> that (the field in question is indexed as numeric). Here is the exception:
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
>   at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
>   at 
> org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
>   at TestMe.main(TestMe.java:56)
> It happens on the following segment: _26t docCount: 914 delCount: 1 
> delFileName: _26t_1.del
> And as you can see, it smells like a corner case (it fails for document 
> number 912, the AIOOB happens from the deleted docs). The code to recreate it 
> is simple:
> FSDirectory dir = FSDirectory.open(new File("index"));
> IndexReader reader = IndexReader.open(dir, true);
> IndexReader[] subReaders = reader.getSequentialSubReaders();
> for (IndexReader subReader : subReaders) {
> Field field = 
> subReader.getClass().getSuperclass().getDeclaredField("si");
> field.setAccessible(true);
> SegmentInfo si = (SegmentInfo) field.get(subReader);
> System.out.println("--> " + si);
> if (si.getDocStoreSegment().contains("_26t")) {
> // this is the probleatic one...
> System.out.println("problematic one...");
> FieldCache.DEFAULT.getLongs(subReader, "__documentdate", 
> FieldCache.NUMERIC_UTILS_LONG_PARSER);
> }
> }
> Here is the result of a check index on that segment:
>   8 of 10: name=_26t docCount=914
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=1.641
> diagnostics = {optimize=false, mergeFactor=10, 
> os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
> lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, 
> os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
> has deletions [delFileName=_26t_1.del]
> test: open reader.OK [1 deleted docs]
> test: fields..OK [32 fields]
> test: field norms.OK [32 fields]
> test: terms, freq, prox...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
>   at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: stored fields...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
>   at 
> org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: term vectorsERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.luc