[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-25 Thread Jan te Beest (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935680#action_12935680
 ] 

Jan te Beest commented on LUCENE-2729:
--

OK, thanks for the info.

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, 
> eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, 
> read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957)
> at 
> proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.index

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-24 Thread Xiaoyang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935410#action_12935410
 ] 

Xiaoyang Gu commented on LUCENE-2729:
-

regarding "I see a patch against RealtimeIndexDataLoader as well. Does it have 
anything to do with the 'load manager stopped' errors I encountered?"

This class hasn't been changed since august.

fatal exception during indexing would kill the indexing thread (load manager 
thread). After any such exception, you will see that error message.
At that point, we cannot recover and it makes no sense to continue indexing.

In Zoie's mbean, there's a HEALTH level value. When that value turns non-0, the 
system needs some serious attention.

You can use JMX to monitor the system health remotely.

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, 
> eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, 
> read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWrite

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-24 Thread Nico Krijnen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935366#action_12935366
 ] 

Nico Krijnen commented on LUCENE-2729:
--

Thanks for all the help guys! We greatly appreciate all the insights and 
thorough responses.

The fixes made to ZOIE by Xiaoyang Gu have indeed resolved the issue. For 
details see:

http://snaprojects.jira.com/browse/ZOIE-51

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, 
> eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, 
> read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957)
> at 
> proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoad

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-24 Thread Jan te Beest (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935351#action_12935351
 ] 

Jan te Beest commented on LUCENE-2729:
--

Splendid, it's working like a charm now!

We're using lucene+zoie as a backend to our DAM system (http://elvisready.com). 
Most of our clients have
multiple wire-feeds coming in 24/7 and twice this issue left a corrupted index 
behind. Which was
easily fixed by deleting the 0k segments file, but it was really scary 
nonetheless.

I won't attach the log file as it's rather boring now, but if you want to take 
a look let me know and I'll post it anyway.

Thanks again for your help!

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, 
> eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, 
> read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957)
> at 

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-24 Thread Jan te Beest (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935256#action_12935256
 ] 

Jan te Beest commented on LUCENE-2729:
--

First of all, many thanks for your support (and patience at it)!

{quote}
Are you running with assertions enabled? Maybe we'll catch
something...
{quote}
Well I had, but somehow the option to the JVM got lost when switching between 
git branches. 
I'll turn it on again before testing with Xiaoyang's patches merged into my 
local git repository.

Looking at the patches, I have a good feeling they will solve the issue.

Xiaoyang, I see a patch against RealtimeIndexDataLoader as well. Does it have 
anything to do with the 'load manager stopped' errors I encountered?
I have to say though that I didn't see those errors when the backup process 
didn't fail.

{quote}
Also: can you try upgrading to the latest 3.0.x tip (svn checkout
https://svn.apache.org/repos/asf/lucene/java/branches/lucene_3_0)?
This doesn't sound like any bug we've fixed on 3.0.x but it's worth a
shot 
{quote}
I'll do that anyway, I like being at the tip of the branch :)

{quote}
I see strange output for the backup, eg "Backup to file _7c.prx,
dataLen: 0" and then "Backup to file _7c.prx DONE, written: 8" - why
is that? (A number of files show this, but then a bunch of others
look like "correct" sizes). Are you sure the backup copy of
segments_5e is actually valid?
{quote}
We actually encounter this regularly for any files other than segments_*, it 
seems harmless though.
As for the segments_5e being valid at that point; yes it is.

bq. You should probably mark ZoieIDP.lastSnapshotSegmentsFileName as volatile.
You're totally right, how could I have missed that?

OK, I'll do some more tests and let you know the outcome.


> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, 
> eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, 
> read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-23 Thread Xiaoyang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935152#action_12935152
 ] 

Xiaoyang Gu commented on LUCENE-2729:
-

taking in Nico's patch for file channel bug ZOIE-53 and now use commit: e6c53ed

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, 
> eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, 
> read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957)
> at 
> proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndex

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-23 Thread Xiaoyang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935134#action_12935134
 ] 

Xiaoyang Gu commented on LUCENE-2729:
-

at g...@github.com:xiaoyang/zoie.git at commit: af19217

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, 
> eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, 
> read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957)
> at 
> proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>  

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-23 Thread Xiaoyang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935130#action_12935130
 ] 

Xiaoyang Gu commented on LUCENE-2729:
-

I already reproduced the same Exception here and the above patch at least fixes 
it in the sense that my code for reproducing it would stop to have Exception.

I committed a patch on my master branch.

Thanks to everybody working on it.

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, 
> eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, 
> read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957)
> at 
> proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndex

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-23 Thread Xiaoyang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935121#action_12935121
 ] 

Xiaoyang Gu commented on LUCENE-2729:
-

The "rw" mode shouldn't explain anything, since we don't write anything, it 
doesn't affect the state of the file at least for MAC OS.

But the missed deletion policy could cause deletion of files that shouldn't be 
deleted and then the "rw" mode could create some 0 length files.

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, 
> eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, 
> read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957)
> at 
> proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
> at 
> 

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-23 Thread Xiaoyang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935118#action_12935118
 ] 

Xiaoyang Gu commented on LUCENE-2729:
-

Yes, I was just looking at that part.

Can you test with the following patched method in DiskSearchIndex class?

  @Override
  protected IndexReader openIndexReaderForDelete() throws IOException {
Directory directory = _dirMgr.getDirectory(true);
if (IndexReader.indexExists(directory)){
  return IndexReader.open(directory, _deletionPolicy, false);
}
else{
  return null;
}
  }


> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, 
> eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, 
> read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957)
> at 
> proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935082#action_12935082
 ] 

Michael McCandless commented on LUCENE-2729:


bq. We accidentally opened the file in "rw" mode. 

OK, phew :)  So that explains how they became 0 length.

But: can you also check where Zoie opens the IndexReader to apply deletions?  I 
suspect that IndexReader is not being provided the snapshot deletion policy...

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, 
> eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, 
> read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957)
> at 
> proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
> at 

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-23 Thread Xiaoyang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935025#action_12935025
 ] 

Xiaoyang Gu commented on LUCENE-2729:
-

Here's the code in zoie that exports index in class DefaultDirectoryManager:
We accidentally opened the file in "rw" mode. But there's no write operations 
to the file. Can you test whether changing it to "r" mode makes any difference?

Thanks!

  public long transferFromFileToChannel(String fileName, WritableByteChannel 
channel) throws IOException
  {
long amount = 0;
File file = new File(_location, fileName);
RandomAccessFile raf = null;
FileChannel fc = null;
try
{
  raf = new RandomAccessFile(file, "rw");
  fc = raf.getChannel();
  long dataLen = fc.size();
  amount += ChannelUtil.writeLong(channel, dataLen);
  amount += fc.transferTo(0, dataLen, channel);
}
finally
{
  try
  {
  if (fc != null) fc.close();
  } finally
  {
if (raf != null) raf.close();
  }
}
return amount;
  }


> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, 
> eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, 
> read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934913#action_12934913
 ] 

Michael McCandless commented on LUCENE-2729:


On your workaround, I think that should work -- you're having IFD
disregard zero length segments files.

Are you running with assertions enabled?  Maybe we'll catch
something...

I really don't like that "refcount is now -1" -- such a thing should
never happen to a refCount!  But looking at ZoieIDP's sources I think
that's harmless.  It has a finalizer that first sets refCount to 0 and
then calls close...

So the backup completes successfully, after which the segments_5e
exists as a (correct) non-zero file.  Then the snapshot is released,
and at that point, instead of the file being deleted (as we'd expect),
something is somehow making it 0 bytes.

Can you try modifying FSDir.fsync?  First, check if they file
about-to-be-sync'd in fact exists (throw exception or warn if not);
second, if we are syncing a segments_N file, print its name & full
stack trace?  I wonder if somehow we are syncing these files after
they got deleted...

Also: can you try upgrading to the latest 3.0.x tip (svn checkout
https://svn.apache.org/repos/asf/lucene/java/branches/lucene_3_0)?
This doesn't sound like any bug we've fixed on 3.0.x but it's worth a
shot :)

I see strange output for the backup, eg "Backup to file _7c.prx,
dataLen: 0" and then "Backup to file _7c.prx DONE, written: 8" -- why
is that?  (A number of files show this, but then a bunch of others
look like "correct" sizes).  Are you sure the backup copy of
segments_5e is actually valid?

You should probably mark ZoieIDP.lastSnapshotSegmentsFileName as
volatile.

OK digging through the log... it looks like IW is closed, committing
segments_5f and protecting segments_5e from deletion.  Then, and
IndexReader is opened (to do deletes) and it looks like that deletes
segments_5e because the next IW that's opened no longer sees
segments_5e (despite that the backup process is still going).

Can you double check that when Zoie opens an IndexReader for deletions
it's passing the ZoieIDP instance that's protecting segments_5e?

That's a serious problem, but, it cannot explain the zero length
files... unless: does Zoie open a read/write file (against the file it
needs to copy) when it does its backups?


> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, 
> eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, 
> read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDelet

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-23 Thread Jan te Beest (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934857#action_12934857
 ] 

Jan te Beest commented on LUCENE-2729:
--

Hi Michael,

I'm one of Nico's colleagues and I've taken over where he left off.

After a lot of testing and debugging, I have found a workaround to the issue 
and I've collected more logging. 
Hopefully this will help to find a real fix (be it in Zoie of Lucene) instead 
of the dirty hack I've put in the code right now.

Attached is a zip-file with a full log (eof-logging-9.log) the same log 
stripped down to a period where the backup is happening, i.e. between creating 
the Snapshot and the moment where the 'read past EOF' is happening 
(stripped-log.log). The other to files are a svn-diff against lucene and a git 
patch against zoie.

First let me try to explain what I think is happening.

{noformat}11:50:07,449 [http-8080-exec-4] DEBUG 
proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy - Init Snapshot 
segmentsFileName=segments_5e{noformat}
Snapshot is initialized with IndexCommit(segments_5e)

It is not really apparent from this log as I had to restart the server due to 
those "load manager has stopped" ZoieExceptions, but the behaviour is the same 
as in all other test runs I did.
All files are properly written to the .snapshot file, including the segments_5e 
file. 

After the backup completes, the Snapshot is dereferenced.
{noformat}
12:43:06,988 [Finalizer] DEBUG 
proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy - 
Snapshot(IndexFileDeleter.CommitPoint(segments_5e)).refCount=0
java.lang.Exception: 12:43:06.988[Finalizer]

at com.ds.util.log.TraceUtil.stackTrace(TraceUtil.java:20)
at 
proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy$Snapshot.decRef(ZoieIndexDeletionPolicy.java:139)
at 
proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy$Snapshot.close(ZoieIndexDeletionPolicy.java:123)
at 
proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy$Snapshot.finalize(ZoieIndexDeletionPolicy.java:148)
at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:83)
at java.lang.ref.Finalizer.access$100(Finalizer.java:14)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:160)

12:43:06,991 [Finalizer] DEBUG 
proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy - 
Snapshot(IndexFileDeleter.CommitPoint(segments_5e)), refcount is now -1
{noformat}

Now something is making the segments_5e file 0 bytes, I've not been able to 
figure out why this is happening though. I'll try to get some proper logging 
about this.

Where the 'read past EOF' happens is quite clear at this point however.

When a new IndexFileDeleter is initialized and the directory is scanned for 
segments_* files, i.e.
{noformat}
if (fileName.startsWith(IndexFileNames.SEGMENTS)) {

  // This is a commit (segments or segments_N), and
  // it's valid (<= the max gen).  Load it, then
  // incref all files it refers to:
{noformat}

This piece of code yields the EOF error, which is obvious as the segments_ file 
is 0 bytes.
{noformat}
  SegmentInfos sis = new SegmentInfos();
  try {
sis.read(directory, fileName);
  } catch (FileNotFoundException e) {
{noformat}

I added another check to the {noformat}if 
(fileName.startsWith(IndexFileNames.SEGMENTS)){noformat}. However that's not 
solving the problem, rather working around it.
{noformat}
long fileLength = directory.fileLength(fileName);
if (fileName.startsWith(IndexFileNames.SEGMENTS) && fileLength <= 0) {
message("init: file "+fileName+" is invalid (has no length), 
skipping further handling");
}

if (fileName.startsWith(IndexFileNames.SEGMENTS) && fileLength > 0) {

  // This is a commit (segments or segments_N), and
  // it's valid (<= the max gen).  Load it, then
  // incref all files it refers to:
{noformat}

We don't see any more errors with this hack (in fact I was able to do 10 
backups of a 21GB index in a row). The segments_ files are deleted from the 
index directory after a while as well, so it's not like they have a positive 
refcount anywhere else (if I understand correctly).

So, somewhere in between the Snapshot being closed and a new IndexFileDeleter 
initialized the segments_* something is writing 0 bytes to that segments_* file.
Like said before, I have not been able to find the place where this happens. As 
it is only happening under heavy load (I had 25 threads reading and writing the 
index besides backing up the index to the snapshot file), I'm guessing it may 
be some synchronization issue?

Perhaps you get an idea now, if you want me to put in more logging or do 
something else to clear it up, just give me a hint.

Che

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932719#action_12932719
 ] 

Michael McCandless commented on LUCENE-2729:


Can you print the full stack trace when files are opened for writing?  Ie we 
need to know who wrote the zero length segments_2q.

Also, can you post "svn diff" output against Lucene's sources?  This would help 
me understand the log file.

Did you instrument each file-copy being done by the backup process?  And, when 
the backup process starts.

On the next run, can you enable the static infoStream in SegmentInfos.java?  
This will print details about how it's finding the segments file.

Also, can you not turn on infoStream for IndexWriters against the RAMDirs?  
This should cut down drastically on the log file.

Lucene normally should never re-use a segments file -- the geneation only ever 
increase on each commit to the index.  Yes, SegmentInfos.read sets the 
generation in SegmentInfos, but this is only called once, upon creating a new 
SegmentInfos().  It should never be overwritten later by Lucene calling read 
again.

So long as the deletion policy is keeping that commit point alive 
(segments_2q), Lucene should not be deleting it.

I sometimes see "writing ", eg:

{noformat}
13:27:02,554 [http-8080-2] TRACE 
com.ds.acm.engine.search.zoieimpl.core.DirectoryManager - 
Directory[lockID=lucene-23f3e69abbd639b3b4a1db928c1b323d, filename=assetIndex] 
writing _5t.tvd, directory contents: _2q.fdt(size=969134416, 
modified=16:41:57,000)
{noformat}

and other times "creating output":

{noformat}
13:26:51,732 
[proj.zoie.impl.indexing.internal.realtimeindexdataloa...@39f46204] DEBUG 
org.apache.lucene.store.SimpleFSDirectory - Creating output for _66.tvx
{noformat}

Why the difference?  Like where in the code are you printing these?

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, 
> eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.Ba

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931081#action_12931081
 ] 

Michael McCandless commented on LUCENE-2729:


bq. I'll see if we can also add some logging to detect (including stack-trace) 
who opened a file but never closes it..

Actually you can use the MockDirectoryWrapper in Lucene's unit tests for this 
-- it tracks open inputs & outputs and then throws an exception on close if 
anything is still open.  You should modify it to print all open inputs/outputs 
instead of throwing an exception, for this usage.

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> backup_force_failure2.log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957)
> at 
> proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIn

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-11 Thread Nico Krijnen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931049#action_12931049
 ] 

Nico Krijnen commented on LUCENE-2729:
--

About 'segments_2j' occuring multiple times. I forgot to mention that we have 
two indexes operating at the same time... one for assets, one for relations.
So adding the directory + RAM/disk index in the logging sounds like a very good 
plan indeed.

About the 0-bytes on close detection logs, seems we log this a few lines too 
early (before super.close is called instead of after) - whoops... will correct 
this and run again.

{quote}
It looks like this 0 length files are being opened (by something) but then 
never closed.
{quote}

I'll see if we can also add some logging to detect (including stack-trace) who 
opened a file but never closes it...

{quote}
EG are you sure you're running the JRE with assertions enabled?
{quote}

Pretty sure it is not, we'll turn assertions on.

{quote}
when you print out the full file listing, in addition to the size of each file, 
also print out the .fileModified result?
{quote}

Will do.

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> backup_force_failure2.log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931044#action_12931044
 ] 

Michael McCandless commented on LUCENE-2729:


When you print the TRACE output saying that a file is being written, can your 
also print which dir instance we are "in"?  I can can see that suddenly 
segments_2j is opened for writing at 11:03:27,653 but it's not clear which 
directory that is.  Maybe print the threads stack trace at that point?

It looks like this 0 length files are being opened (by something) but then 
never closed.

Also: I see lines with "Writing _2q.fdx" where the directory contents on that 
line (and lines above it) show _2q.fdx clearly exists.  Yet, you had added an 
assert to check this very case but the assert never tripped?  So something is 
wrong.  EG are you sure you're running the JRE with assertions enabled?

Maybe when you print out the full file listing, in addition to the size of each 
file, also print out the .fileModified result?  This can help us confirm that 
files are getting overwritten.

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> backup_force_failure2.log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtain

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931034#action_12931034
 ] 

Michael McCandless commented on LUCENE-2729:


Also, I like that you detect 0-bytes on close!  However, I'm not sure it's 
working right.  EG line 1,017,585 has this:
{noformat}
java.lang.Exception: 11:02:18,639 
[proj.zoie.impl.indexing.internal.realtimeindexdataloa...@7636d59a] FOUND ONE! 
closing 0kb IndexOutput, file: _65.fnm, file.length():0, bytesWritten: 0
{noformat}

But then shortly thereafter (line 1,017,604) I see that _65.fnm is 1749 bytes 
long.  Maybe the code you added for this warning is failing to take into 
account the buffered bytes?  Ie, instead of calling file.length() you should 
call IndexOutput.getFilePointer()?  Hmm, but if you call file.length() after 
the file really is closed... then that ought to be correct.

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> backup_force_failure2.log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lu

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931031#action_12931031
 ] 

Michael McCandless commented on LUCENE-2729:


I'll look at the new log, but, quickly one correction: Zoie writes first into a 
RAMDir, and then periodically uses addIndexesNoOptimize to fold that RAM index 
into the "real" one.

So you have to be careful when you "grep" because for segments_2j, all of the 
'delete "segments_2j"' look like they are in the RAMDir (the thread is 
ConsumerThread), and all but one of the 'wrote segments file "segments_2j"' are 
also to the RAMDir.

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> backup_force_failure2.log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957)
> at 
> proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
> at 
> proj.zoie.impl.indexing.interna

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-11 Thread Nico Krijnen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931024#action_12931024
 ] 

Nico Krijnen commented on LUCENE-2729:
--

We finished another run today and got some interesting results with from the 
additional logging. The log times are searchable in the log file...

- 09:40:21,996: we are creating a snapshot for our backup, segments filename 
used for snapshot: 'segments_2j'
On the next line we can see that at that moment the segments_2j is 2353 bytes. 
We log the filenames in the index + the (filesizes).

- 11:03:27,653: segments_2j is written to disk
_How can this be while this is the name of the segments filename used to create 
our Snapshot?_
On the next line we see that it has indeed been written to disk, but has a 0 
byte file size.

- 11:08:02,731: First "read past EOF" exception
_It looks like segments filenames are reused and at a point in time well after 
our Snapshot is created, the name 'segments_2j' is reused?
This somehow causes the segments file to get a size of 0 bytes. Perhaps it is 
being deleted and written at the same time, or because of the name clash 
there's no data to be written to the segments file?_

- 11:16:03,300: our backup is finished.

Here is an excerpt of some of the more interesting log entries:

{quote}
09:39:30,719 [http-8080-2] DEBUG com.ds.acm.logic.tools.BackupManager - Start 
asset index backup

09:40:21,996 [http-8080-2] DEBUG 
proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy - Ctor Snapshot. 
segments filename: *segments_2j*, commit filenames: [_5f.tvd, _4c.tvf, _5f.tvf, 
_4d.fdx, _4p.tis, _4c.tvd, _4p.frq, _5e.tvx, _39.tis, _3u.frq, _4d.fdt, 
_4p.tii, _5f.fnm, _4c.tvx, _39.tii, _55.tis, _2q.fnm, _4p.fdx, _2q.tvf, 
_5e.tvf, _5e.tvd, _2q.tvd, _5f.tvx, _4p.fdt, _55.tii, _55.nrm, _55.frq, 
_2q.tvx, _3u.tvx, _5f.fdx, _4c.frq, _5f.fdt, _4c.fdx, _2q.nrm, _4c.fdt, 
_4p.tvf, _4c.nrm, _3v.tvf, _4c.tii, _3u.prx, _4p.tvd, _4c.tis, _3u.fdt, 
_5e.fnm, _3v.tvd, _3u.fdx, _55.tvx, _39.fdt, _39.fnm, _5e.prx, _3u.tis, 
*segments_2j*, _3v.prx, _2q.fdx, _4c.prx, _4d.prx, _4p.tvx, _39.fdx, _3u.tii, 
_2q.fdt, _4d.frq, _55.fdt, _2q.prx, _5e.tii, _3u.fnm, _55.fdx, _5e.nrm, 
_5f.tii, _5e.tis, _4d.nrm, _4d.tii, _5e.fdt, _4d.tis, _5f.frq, _5e.fdx, 
_39.tvd, _39.tvf, _55.fnm, _3v.frq, _2q.frq, _2q.tii, _3v.tii, _5f.prx, 
_4p.nrm, _5f.nrm, _2q.tis, _3v.tvx, _4c.fnm, _3v.nrm, _55.tvf, _55.tvd, 
_3u.nrm, _5f.tis, _3v.tis, _3v.fdt, _4d.fnm, _4d.tvf, _3v.fdx, _4d.tvd, 
_4d.tvx, _4p.prx, _3v.fnm, _5e.frq, _39.prx, _39.tvx, _39.frq, _4p.fnm, 
_39.nrm, _3u.tvf, _55.prx, _3u.tvd]

11:03:27,653 [http-8080-2] TRACE 
com.ds.acm.engine.search.zoieimpl.core.DirectoryManager - Writing 
*segments_2j*, directory contents: _2q.fdt(969134416), _2q.fdx(36652), 
_2q.fnm(276), _2q.frq(4685726), _2q.nrm(9166), _2q.prx(393230403), 
_2q.tii(7447), _2q.tis(746299), _2q.tvd(8394), _2q.tvf(599185081), 
_2q.tvx(73300), _39.fdt(2061261675), _39.fdx(1012), _39.fnm(276), 
_39.frq(17754579), _39.nrm(256), _39.prx(121067407), _39.tii(137511), 
_39.tis(11726653), _39.tvd(185), _39.tvf(233037042), _39.tvx(2020), 
_3u.fdt(1595882722), _3u.fdx(63692), _3u.fnm(330), _3u.frq(8001869), 
_3u.nrm(15926), _3u.prx(647374863), _3u.tii(11319), _3u.tis(1168399), 
_3u.tvd(14209), _3u.tvf(986370136), _3u.tvx(127380), _3v.fdt(2057147455), 
_3v.fdx(476), _3v.fnm(384), _3v.frq(1520), _3v.nrm(122), 
_3v.prx(109724024), _3v.tii(132491), _3v.tis(11457688), _3v.tvd(114), 
_3v.tvf(211902147), _3v.tvx(948), _4c.fdt(2691565961), _4c.fdx(39572), 
_4c.fnm(276), _4c.frq(18724620), _4c.nrm(9896), _4c.prx(590255960), 
_4c.tii(141243), _4c.tis(12185869), _4c.tvd(9894), _4c.tvf(932649779), 
_4c.tvx(79140), _4d.fdt(2398908136), _4d.fdx(548), _4d.fnm(354), 
_4d.frq(24581614), _4d.nrm(140), _4d.prx(158243133), _4d.tii(141948), 
_4d.tis(12259425), _4d.tvd(140), _4d.tvf(303769970), _4d.tvx(1092), 
_4p.fdt(1081212027), _4p.fdx(212), _4p.fnm(354), _4p.frq(8294102), _4p.nrm(56), 
_4p.prx(60513257), _4p.tii(134898), _4p.tis(11376730), _4p.tvd(56), 
_4p.tvf(116715012), _4p.tvx(420), _55.fdt(787581180), _55.fdx(156), 
_55.fnm(354), _55.frq(5376741), _55.nrm(42), _55.prx(39539985), 
_55.tii(133483), _55.tis(11351989), _55.tvd(42), _55.tvf(75958725), 
_55.tvx(308), _5e.prx(0), _5z.fdt(853614415), _5z.fdx(23700), _5z.fnm(1696), 
_5z.frq(9041984), _5z.nrm(5928), _5z.prx(32617823), _5z.tii(146115), 
_5z.tis(12387143), _5z.tvd(4142), _5z.tvf(74409815), _5z.tvx(47396), 
_63.fdt(212690811), _63.fdx(8188), _63.fnm(1652), _63.frq(2492889), 
_63.nrm(2050), _63.prx(5672093), _63.tii(8470), _63.tis(689993), _63.tvd(1442), 
_63.tvf(15371474), _63.tvx(16372), _64.fdt(4118409126), _64.fdx(1484), 
_64.fnm(384), _64.frq(35294399), _64.nrm(374), _64.prx(230791431), 
_64.tii(143860), _64.tis(12491845), _64.tvd(295), _64.tvf(444939185), 
_64.tvx(2964), _65.fdt(1369489408), _65.fdx(0), _65.fnm(1749), 
index.directory(5), segments.gen(20), segments

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930579#action_12930579
 ] 

Michael McCandless commented on LUCENE-2729:


bq. Will do, but we also got the error while zoie was opening a new IndexWriter:

You're right... the next attempt to open an IndexWriter, or and IndexReader 
that's trying to commit pending deletes, will hit the exc due to the 
zero-length segments file.

So in fact this already helps us further bracket when the truncation happened, 
because IW 115 was able to successfully open on the assetIndex, but then the 
attempt to write deletions hits the exc, as well as the next attempt to open a 
writer on the index.

So sometime after IW 115 is opened and before the deletions are applied, 
something truncates these index files.

bq. My guess is that the 'read past EOF' is not really specific to applying 
deletes, but just happens when the SegementInfos is loaded on a 0kb file.

That's right.  What's really odd is that these 0-length files are very old, ie, 
indexing has long ago moved on to newer segments.  I suspect they are in fact 
the files protected by the snapshot for backups... so I think it's important we 
instrument Zoie's backup process next.

{quote}
We ran a test with your patch to throw a RuntimeException when an output 
already exists.
We did get a 'read past EOF', but the additional RuntimeException is never 
thrown.
{quote}
H.  So it sounds like it's not Lucene's SimpleFSDir.createOutput that's 
truncating these files.

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-09 Thread Nico Krijnen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930186#action_12930186
 ] 

Nico Krijnen commented on LUCENE-2729:
--

We really appreciate the help and understand that this is not the only thing 
you are working on ;-)
Collecting post-it notes sounds so familliar :)

We ran a test with your patch to throw a RuntimeException when an output 
already exists.
We did get a 'read past EOF', but the additional RuntimeException is never 
thrown.

We'll add the other log points and do another test run with those. If you have 
more suggestions for logging, let us know, we won't start the next run until 
tomorrow anyway...

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957)
> at 
> proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
> at 
> proj.zoie.impl.indexing

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-09 Thread Nico Krijnen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930142#action_12930142
 ] 

Nico Krijnen commented on LUCENE-2729:
--

{quote}
...spooky "other" exceptions...
{quote}

These are all 'WARN' level and all of them caused by non-critical timeouts in 
our code. All caused by the system being under very heavy load needed to 
reproduce the bug.


{quote}
Would it be possible to instrument to Zoie code to note as the backup
process is copying each file in the snapshot, and at that point print
a listing of the directory?
{quote}

Will do, that is a good one. Then we know which files are being 'held' by the 
Zoie deletion policy for the backup.


{quote}
Also, can you write to the log when Zoie applies deletes? (Looks like
it happens in proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs).
It's on applying deletes that the corruption is first detected, so, if
we log this event we can better bracket the period of time when the
corruption happened.
{quote}

Will do, but we also got the error while zoie was opening a new IndexWriter:

{code}
15:25:03,453 
[proj.zoie.impl.indexing.internal.realtimeindexdataloa...@3d9e7719] 
ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
Problem copying segments: read past EOF
java.io.IOException: read past EOF
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
at 
org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
at 
org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:170)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1127)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:960)
at 
proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
at 
proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
at 
proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
at 
proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
at 
proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:172)
at 
proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:377)
{code}

I'll add a log there too.
My guess is that the 'read past EOF' is not really specific to applying 
deletes, but just happens when the SegementInfos is loaded on a 0kb file.


{quote}
Does Zoie ever open an IndexReader or IndexWriter passing in an
existing commit point? Or does it always open the latest commit?
{quote}

I'll try to find out.


{quote}
The timestamps on the zero length files are particularly spooky - the
earliest ones are 15:21 (when first EOF is hit), but then also 15:47
and 15:49 on the others. It seems like on 3 separate occasions
something truncated the files.
{quote}

Indeed, I thought this was weird too.

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details.

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930130#action_12930130
 ] 

Michael McCandless commented on LUCENE-2729:



Thank you for attaching the IW infoStream output!  Sorry it took so
long for me to respond.

Aside: it is sad but there is no master TODO list in open source.  It
all comes down to our own email inboxes, todo lists, post-it notes all
over the place, etc., and (in my case anyway) things sometimes fall
past the event horizon.

So please if I don't respond in a day or two on an active issue, bump
it again (put a comment on the issue)!  I'd much rather people
over-nag than under-nag but unfortunately under-nag is far far more
common and it causes important issues to languish unnecessarily.

OK back to the issue :)

I looked through the infoStream but I don't see a smoking gun.  Ie,
the logs indicate that nowhere did Lucene try to delete/overwrite
those zero-length files; I see other files being deleted, so, this is
what I'd expect given that ZoieDeletionPolicy is presumably protecting
the segments_3t commit point (to back up its files).

I do see some spooky "other" exceptions, though... these are the first
2 exceptions I see in the log:

{noformat}
14:27:41,290 [bigIndexBuilder_QueueProcessor_3] WARN  
com.ds.acm.logic.impl.AssetManagerImpl -
  Ignoring AssetNotFoundException trying to make sure all metadata from index 
is loaded before updating an existing asset
Exception in thread "pool-5-thread-6" java.lang.NullPointerException
at 
org.apache.coyote.http11.InternalNioOutputBuffer.writeToSocket(InternalNioOutputBuffer.java:430)
at 
org.apache.coyote.http11.InternalNioOutputBuffer.flushBuffer(InternalNioOutputBuffer.java:784)
at 
org.apache.coyote.http11.InternalNioOutputBuffer.flush(InternalNioOutputBuffer.java:300)
at 
org.apache.coyote.http11.Http11NioProcessor.action(Http11NioProcessor.java:1060)
at org.apache.coyote.Response.action(Response.java:183)
at 
org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:314)
at 
org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:288)
at org.apache.catalina.connector.Response.flushBuffer(Response.java:548)
at 
org.apache.catalina.connector.ResponseFacade.flushBuffer(ResponseFacade.java:279)
at 
org.granite.gravity.AbstractChannel.runReceived(AbstractChannel.java:251)
at 
org.granite.gravity.AbstractChannel.runReceive(AbstractChannel.java:199)
at org.granite.gravity.AsyncReceiver.doRun(AsyncReceiver.java:34)
at 
org.granite.gravity.AsyncChannelRunner.run(AsyncChannelRunner.java:52)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
{noformat}

and

{noformat}
14:40:18,382 [Low Memory Detector] WARN  
com.ds.acm.engine.search.zoieimpl.core.ZoieSystemManager -
  Ignoring timeout while attempting to flush zoie memory index to disk to free 
memory
proj.zoie.api.ZoieException: sync timed out
at 
proj.zoie.impl.indexing.AsyncDataConsumer.syncWthVersion(AsyncDataConsumer.java:177)
at 
proj.zoie.impl.indexing.AsyncDataConsumer.flushEvents(AsyncDataConsumer.java:155)
at proj.zoie.impl.indexing.ZoieSystem.flushEvents(ZoieSystem.java:308)
at 
com.ds.acm.engine.search.zoieimpl.core.ZoieSystemManager.onLowMemory(ZoieSystemManager.java:220)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.ds.util.event.BasicEventBroadcaster$Handler.invokeMethod(BasicEventBroadcaster.java:197)
at 
com.ds.util.event.BasicEventBroadcaster$Handler.handle(BasicEventBroadcaster.java:190)
at 
com.ds.util.event.BasicEventBroadcaster.fire(BasicEventBroadcaster.java:108)
at 
com.ds.util.cache.LowMemoryWarningBroadcaster$1.handleNotification(LowMemoryWarningBroadcaster.java:135)
at 
sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138)
at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
at 
sun.management.MemoryPoolImpl$PoolSensor.triggerAction(MemoryPoolImpl.java:272)
at sun.management.Sensor.trigger(Sensor.java:120)
{noformat}

That 2nd exception happens a total of 9 times... and is rather
spooky.  What does it mean?  Ie, why is Zoie timing out on flushing
the index to disk, and, what does it then do w/ its RAMDir?

I also see alot of these:

{noformat}
15:50:18,856 [bigIndexBuilder_QueueProcessor_10] WARN

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-09 Thread Nico Krijnen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930094#action_12930094
 ] 

Nico Krijnen commented on LUCENE-2729:
--

Thx! We will update, patch and re-run the test.

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957)
> at 
> proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.in

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-09 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930092#action_12930092
 ] 

Simon Willnauer commented on LUCENE-2729:
-

ah I think I have been on the wrong branch nevermind!

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957)
> at 
> proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-09 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930091#action_12930091
 ] 

Simon Willnauer commented on LUCENE-2729:
-

bq. at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
I just had a quick look at it and I wonder what revision you use for that? 
SegmentsInfos.java does not contain a read call nice [revision 892992 | 
https://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/index/SegmentInfos.java?revision=892992&view=markup]

can you clarify?

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957)
> at 
> proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
> at 
> proj.zoie.impl.indexing.inte

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-09 Thread Nico Krijnen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930086#action_12930086
 ] 

Nico Krijnen commented on LUCENE-2729:
--

Any ideas on what could be happening? It sounds like IndexWriter is the only 
one that is modifying these files, zoie only seems to be reading from them to 
make the backup. What should we look for in the IndexWriter's infoStream?

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957)
> at 
> proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-08 Thread John Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929702#action_12929702
 ] 

John Wang commented on LUCENE-2729:
---

zoie does not touch index files, only adds an index.directory file containing 
version information.

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
> Attachments: 2010-11-02 IndexWriter infoStream log.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957)
> at 
> proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoi

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927043#action_12927043
 ] 

Michael McCandless commented on LUCENE-2729:


Somehow we have to locate the event that causes the truncation of the files.

Can you enable IndexWriter's infoStream and then get the corruption to happen, 
and post the results?

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957)
> at 
> proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-01 Thread Nico Krijnen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927040#action_12927040
 ] 

Nico Krijnen commented on LUCENE-2729:
--

In the mean time, we also did a test with a checkout of the latest lucene_3_0 
branch (@2010-11-01), which should include the fix that Jason mentions.

Does not seem to make a difference though. We still get a 'read past EOF'.

On the last run we did get a slightly different stacktrace. This time the 'read 
past EOF' happens when the zoie RAM index is written to the zoie Disk index. 
Last time it occurred a little earlier in BaseSearchIndex#loadFromIndex, while 
committing deletes to the disk IndexReader. This could be just a coincidence 
though. My feeling is still that the 'read past EOF' is just a result/symptom 
of something else that happened just before it - still trying to figure out 
what that could be...

{code}
15:25:03,453 
[proj.zoie.impl.indexing.internal.realtimeindexdataloa...@3d9e7719] 
ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
Problem copying segments: read past EOF
java.io.IOException: read past EOF
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
at 
org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
at 
org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:170)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1127)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:960)
at 
proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
at 
proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
at 
proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
at 
proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
at 
proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:172)
at 
proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:377)
{code}

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-01 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927031#action_12927031
 ] 

Jason Rutherglen commented on LUCENE-2729:
--

Using Solr 1.4.2 on disk full .del files were being written with a file length 
of zero, however that is supposed to be fixed by 
https://issues.apache.org/jira/browse/LUCENE-2593  This doesn't appear to be 
similar because more than the .del files are of zero length.

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
> Problem copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957)
> at 
> proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoi

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-01 Thread Nico Krijnen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926908#action_12926908
 ] 

Nico Krijnen commented on LUCENE-2729:
--

bq. Was there no original root cause here? Eg disk full?

This was one of the first things i thought, but the disk has more than enough 
free space: 200GB. Also, for this test we write the backup to a different disk 
- both for better performance and to prevent the disk with the index on it from 
running out of free space.

bq. Or is the "read past EOF" on closing an IndexReader w/ pending deletes 
really the first exception you see?

It is the first exception we see. We turned on quite a bit of additional 
logging but we have not been able to find out anything weird happening before 
this error. I do expect something weird must have happened to cause the 'read 
past EOF'.

Do you have any clues as to what we could look for? - that might narrow the 
search.
We are able to consistently reproduce this on our test environment. So if you 
have clues to specific debug logging that should be turned on - we can do 
another test run.

bq. Does zoie somehow touch the index files?

We'll try to find out. For as far as I see the basic backup procedure is to 
grab the last 'commit snapshot', prevent it from being deleted 
(ZoieIndexDeletionPolicy), and write all the files mentioned in the commit 
snapshot to a NIO WritableByteChannel.

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926907#action_12926907
 ] 

Michael McCandless commented on LUCENE-2729:


That long string of length 0 files is very bizarre.

Was there no original root cause here?  Eg disk full?

Or is the "read past EOF" on closing an IndexReader w/ pending deletes really 
the first exception you see?

Does zoie somehow touch the index files?  Taking a backup is fundamentally a 
read-only op on the index, so that process shouldn't by itself truncate index 
files.

Something is somehow reaching in and zero-ing out these files.  I don't think 
Lucene itself would do this.  For example, the serious of _6i.XXX zero'd 
files... Lucene writes these
files roughly in sequence, so if something bad happened in writing the 
postings, then the .nrm file should not even exist.

So we need to figure out who is truncating these files...


> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - Problem 
> copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
> at org.apache.lucene.index.IndexWriter.(I

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-01 Thread Nico Krijnen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926859#action_12926859
 ] 

Nico Krijnen commented on LUCENE-2729:
--

A second file listing from another test run, same result: read past EOF

{code}
jteb:assetIndex jteb$ ls -la
total 38739848
drwxr-xr-x  2 jteb  jteb4964 26 okt 11:51 .
drwxr-xr-x  3 jteb  jteb 204 22 okt 11:42 ..
-rw-r--r--  1 jteb  jteb   969134416 18 okt 16:41 _2q.fdt
-rw-r--r--  1 jteb  jteb   36652 18 okt 16:41 _2q.fdx
-rw-r--r--  1 jteb  jteb 276 18 okt 16:41 _2q.fnm
-rw-r--r--  1 jteb  jteb 4685726 18 okt 16:41 _2q.frq
-rw-r--r--  1 jteb  jteb9166 18 okt 16:41 _2q.nrm
-rw-r--r--  1 jteb  jteb   393230403 18 okt 16:42 _2q.prx
-rw-r--r--  1 jteb  jteb7447 18 okt 16:42 _2q.tii
-rw-r--r--  1 jteb  jteb  746299 18 okt 16:42 _2q.tis
-rw-r--r--  1 jteb  jteb8394 18 okt 16:42 _2q.tvd
-rw-r--r--  1 jteb  jteb   599185081 18 okt 16:42 _2q.tvf
-rw-r--r--  1 jteb  jteb   73300 18 okt 16:42 _2q.tvx
-rw-r--r--  1 jteb  jteb  2061261675 18 okt 16:44 _39.fdt
-rw-r--r--  1 jteb  jteb1012 18 okt 16:44 _39.fdx
-rw-r--r--  1 jteb  jteb 276 18 okt 16:44 _39.fnm
-rw-r--r--  1 jteb  jteb17754579 18 okt 16:44 _39.frq
-rw-r--r--  1 jteb  jteb 256 18 okt 16:44 _39.nrm
-rw-r--r--  1 jteb  jteb   121067407 18 okt 16:44 _39.prx
-rw-r--r--  1 jteb  jteb  137511 18 okt 16:44 _39.tii
-rw-r--r--  1 jteb  jteb11726653 18 okt 16:44 _39.tis
-rw-r--r--  1 jteb  jteb 185 18 okt 16:44 _39.tvd
-rw-r--r--  1 jteb  jteb   233037042 18 okt 16:44 _39.tvf
-rw-r--r--  1 jteb  jteb2020 18 okt 16:44 _39.tvx
-rw-r--r--  1 jteb  jteb  1595882722 18 okt 16:45 _3u.fdt
-rw-r--r--  1 jteb  jteb   63692 18 okt 16:45 _3u.fdx
-rw-r--r--  1 jteb  jteb 330 18 okt 16:45 _3u.fnm
-rw-r--r--  1 jteb  jteb 8001869 18 okt 16:45 _3u.frq
-rw-r--r--  1 jteb  jteb   15926 18 okt 16:45 _3u.nrm
-rw-r--r--  1 jteb  jteb   647374863 18 okt 16:45 _3u.prx
-rw-r--r--  1 jteb  jteb   11319 18 okt 16:45 _3u.tii
-rw-r--r--  1 jteb  jteb 1168399 18 okt 16:45 _3u.tis
-rw-r--r--  1 jteb  jteb   14209 18 okt 16:45 _3u.tvd
-rw-r--r--  1 jteb  jteb   986370136 18 okt 16:46 _3u.tvf
-rw-r--r--  1 jteb  jteb  127380 18 okt 16:46 _3u.tvx
-rw-r--r--  1 jteb  jteb  2057147455 18 okt 16:47 _3v.fdt
-rw-r--r--  1 jteb  jteb 476 18 okt 16:47 _3v.fdx
-rw-r--r--  1 jteb  jteb 384 18 okt 16:47 _3v.fnm
-rw-r--r--  1 jteb  jteb1520 18 okt 16:47 _3v.frq
-rw-r--r--  1 jteb  jteb 122 18 okt 16:47 _3v.nrm
-rw-r--r--  1 jteb  jteb   109724024 18 okt 16:47 _3v.prx
-rw-r--r--  1 jteb  jteb  132491 18 okt 16:47 _3v.tii
-rw-r--r--  1 jteb  jteb11457688 18 okt 16:47 _3v.tis
-rw-r--r--  1 jteb  jteb 114 18 okt 16:47 _3v.tvd
-rw-r--r--  1 jteb  jteb   211902147 18 okt 16:48 _3v.tvf
-rw-r--r--  1 jteb  jteb 948 18 okt 16:48 _3v.tvx
-rw-r--r--  1 jteb  jteb  2691565961 18 okt 16:49 _4c.fdt
-rw-r--r--  1 jteb  jteb   39572 18 okt 16:49 _4c.fdx
-rw-r--r--  1 jteb  jteb 276 18 okt 16:49 _4c.fnm
-rw-r--r--  1 jteb  jteb18724620 18 okt 16:49 _4c.frq
-rw-r--r--  1 jteb  jteb9896 18 okt 16:49 _4c.nrm
-rw-r--r--  1 jteb  jteb   590255960 18 okt 16:50 _4c.prx
-rw-r--r--  1 jteb  jteb  141243 18 okt 16:50 _4c.tii
-rw-r--r--  1 jteb  jteb12185869 18 okt 16:50 _4c.tis
-rw-r--r--  1 jteb  jteb9894 18 okt 16:50 _4c.tvd
-rw-r--r--  1 jteb  jteb   932649779 18 okt 16:51 _4c.tvf
-rw-r--r--  1 jteb  jteb   79140 18 okt 16:51 _4c.tvx
-rw-r--r--  1 jteb  jteb  2398908136 18 okt 16:52 _4d.fdt
-rw-r--r--  1 jteb  jteb 548 18 okt 16:52 _4d.fdx
-rw-r--r--  1 jteb  jteb 354 18 okt 16:52 _4d.fnm
-rw-r--r--  1 jteb  jteb24581614 18 okt 16:52 _4d.frq
-rw-r--r--  1 jteb  jteb 140 18 okt 16:52 _4d.nrm
-rw-r--r--  1 jteb  jteb   158243133 18 okt 16:52 _4d.prx
-rw-r--r--  1 jteb  jteb  141948 18 okt 16:52 _4d.tii
-rw-r--r--  1 jteb  jteb12259425 18 okt 16:52 _4d.tis
-rw-r--r--  1 jteb  jteb 140 18 okt 16:52 _4d.tvd
-rw-r--r--  1 jteb  jteb   303769970 18 okt 16:53 _4d.tvf
-rw-r--r--  1 jteb  jteb1092 18 okt 16:53 _4d.tvx
-rw-r--r--  1 jteb  jteb  1081212027 18 okt 16:53 _4p.fdt
-rw-r--r--  1 jteb  jteb 212 18 okt 16:53 _4p.fdx
-rw-r--r--  1 jteb  jteb 354 18 okt 16:53 _4p.fnm
-rw-r--r--  1 jteb  jteb 8294102 18 okt 16:53 _4p.frq
-rw-r--r--  1 jteb  jteb  56 18 okt 16:53 _4p.nrm
-rw-r--r--  1 jteb  jteb60513257 18 okt 16:53 _4p.prx
-rw-r--r--  1 jteb  jteb  134898 18 okt 16:53 _4p.tii
-rw-r--r--  1 jteb  jteb11376730 18 okt 16:53 _4p.tis
-rw-r--r--  1 jteb  jteb  56 18 okt 16:53 _4p.tvd
-rw-r--r--  1 jteb  jteb   116715012 18 okt 16:53 _4p.tvf
-rw-r--r--  1 jteb  jteb 420 18 okt 16:53 _4p.tvx
-rw-r--r--  1 jteb  jteb   787581180 18 okt 16:54 

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-01 Thread Nico Krijnen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926858#action_12926858
 ] 

Nico Krijnen commented on LUCENE-2729:
--

{code}
jteb:assetIndex jteb$ ls -la
total 41550832
drwxr-xr-x  2 jteb  jteb4862  1 nov 08:52 .
drwxr-xr-x  4 jteb  jteb 238 29 okt 14:10 ..
-rw-r--r--@ 1 jteb  jteb   21508  1 nov 08:52 .DS_Store
-rw-r--r--  1 jteb  jteb   969134416 18 okt 16:41 _2q.fdt
-rw-r--r--  1 jteb  jteb   36652 18 okt 16:41 _2q.fdx
-rw-r--r--  1 jteb  jteb 276 18 okt 16:41 _2q.fnm
-rw-r--r--  1 jteb  jteb 4685726 18 okt 16:41 _2q.frq
-rw-r--r--  1 jteb  jteb9166 18 okt 16:41 _2q.nrm
-rw-r--r--  1 jteb  jteb   393230403 18 okt 16:42 _2q.prx
-rw-r--r--  1 jteb  jteb7447 18 okt 16:42 _2q.tii
-rw-r--r--  1 jteb  jteb  746299 18 okt 16:42 _2q.tis
-rw-r--r--  1 jteb  jteb8394 18 okt 16:42 _2q.tvd
-rw-r--r--  1 jteb  jteb   599185081 18 okt 16:42 _2q.tvf
-rw-r--r--  1 jteb  jteb   73300 18 okt 16:42 _2q.tvx
-rw-r--r--  1 jteb  jteb  1595882722 18 okt 16:45 _3u.fdt
-rw-r--r--  1 jteb  jteb   63692 18 okt 16:45 _3u.fdx
-rw-r--r--  1 jteb  jteb 330 18 okt 16:45 _3u.fnm
-rw-r--r--  1 jteb  jteb 8001869 18 okt 16:45 _3u.frq
-rw-r--r--  1 jteb  jteb   15926 18 okt 16:45 _3u.nrm
-rw-r--r--  1 jteb  jteb   647374863 18 okt 16:45 _3u.prx
-rw-r--r--  1 jteb  jteb   11319 18 okt 16:45 _3u.tii
-rw-r--r--  1 jteb  jteb 1168399 18 okt 16:45 _3u.tis
-rw-r--r--  1 jteb  jteb   14209 18 okt 16:45 _3u.tvd
-rw-r--r--  1 jteb  jteb   986370136 18 okt 16:46 _3u.tvf
-rw-r--r--  1 jteb  jteb  127380 18 okt 16:46 _3u.tvx
-rw-r--r--  1 jteb  jteb  2691565961 18 okt 16:49 _4c.fdt
-rw-r--r--  1 jteb  jteb   39572 18 okt 16:49 _4c.fdx
-rw-r--r--  1 jteb  jteb 276 18 okt 16:49 _4c.fnm
-rw-r--r--  1 jteb  jteb18724620 18 okt 16:49 _4c.frq
-rw-r--r--  1 jteb  jteb9896 18 okt 16:49 _4c.nrm
-rw-r--r--  1 jteb  jteb   590255960 18 okt 16:50 _4c.prx
-rw-r--r--  1 jteb  jteb  141243 18 okt 16:50 _4c.tii
-rw-r--r--  1 jteb  jteb12185869 18 okt 16:50 _4c.tis
-rw-r--r--  1 jteb  jteb9894 18 okt 16:50 _4c.tvd
-rw-r--r--  1 jteb  jteb   932649779 18 okt 16:51 _4c.tvf
-rw-r--r--  1 jteb  jteb   79140 18 okt 16:51 _4c.tvx
-rw-r--r--  1 jteb  jteb  2398908136 18 okt 16:52 _4d.fdt
-rw-r--r--  1 jteb  jteb 548 18 okt 16:52 _4d.fdx
-rw-r--r--  1 jteb  jteb 354 18 okt 16:52 _4d.fnm
-rw-r--r--  1 jteb  jteb24581614 18 okt 16:52 _4d.frq
-rw-r--r--  1 jteb  jteb 140 18 okt 16:52 _4d.nrm
-rw-r--r--  1 jteb  jteb   158243133 18 okt 16:52 _4d.prx
-rw-r--r--  1 jteb  jteb  141948 18 okt 16:52 _4d.tii
-rw-r--r--  1 jteb  jteb12259425 18 okt 16:52 _4d.tis
-rw-r--r--  1 jteb  jteb 140 18 okt 16:52 _4d.tvd
-rw-r--r--  1 jteb  jteb   303769970 18 okt 16:53 _4d.tvf
-rw-r--r--  1 jteb  jteb1092 18 okt 16:53 _4d.tvx
-rw-r--r--  1 jteb  jteb  4118409126 29 okt 16:26 _6g.fdt
-rw-r--r--  1 jteb  jteb1484 29 okt 16:26 _6g.fdx
-rw-r--r--  1 jteb  jteb 384 29 okt 16:17 _6g.fnm
-rw-r--r--  1 jteb  jteb35294399 29 okt 16:27 _6g.frq
-rw-r--r--  1 jteb  jteb 374 29 okt 16:27 _6g.nrm
-rw-r--r--  1 jteb  jteb   230791431 29 okt 16:27 _6g.prx
-rw-r--r--  1 jteb  jteb  143860 29 okt 16:27 _6g.tii
-rw-r--r--  1 jteb  jteb12491845 29 okt 16:27 _6g.tis
-rw-r--r--  1 jteb  jteb 295 29 okt 16:28 _6g.tvd
-rw-r--r--  1 jteb  jteb   444939185 29 okt 16:28 _6g.tvf
-rw-r--r--  1 jteb  jteb2964 29 okt 16:28 _6g.tvx
-rw-r--r--  1 jteb  jteb  2758122671 29 okt 16:31 _6h.fdt
-rw-r--r--  1 jteb  jteb   96388 29 okt 16:31 _6h.fdx
-rw-r--r--  1 jteb  jteb 723 29 okt 16:29 _6h.fnm
-rw-r--r--  1 jteb  jteb51142700 29 okt 16:31 _6h.frq
-rw-r--r--  1 jteb  jteb   24100 29 okt 16:31 _6h.nrm
-rw-r--r--  1 jteb  jteb   189178767 29 okt 16:31 _6h.prx
-rw-r--r--  1 jteb  jteb  270472 29 okt 16:31 _6h.tii
-rw-r--r--  1 jteb  jteb21710405 29 okt 16:31 _6h.tis
-rw-r--r--  1 jteb  jteb   23873 29 okt 16:31 _6h.tvd
-rw-r--r--  1 jteb  jteb   394088075 29 okt 16:31 _6h.tvf
-rw-r--r--  1 jteb  jteb  192772 29 okt 16:31 _6h.tvx
-rw-r--r--  1 jteb  jteb   0 29 okt 20:22 _8b.fnm
-rw-r--r--  1 jteb  jteb   0 29 okt 20:26 _8b.tvd
-rw-r--r--  1 jteb  jteb   0 29 okt 20:26 _8b.tvf
-rw-r--r--  1 jteb  jteb   0 29 okt 20:22 _8c.fdt
-rw-r--r--  1 jteb  jteb   0 29 okt 20:22 _8c.fdx
-rw-r--r--  1 jteb  jteb   0 29 okt 20:26 _8c.frq
-rw-r--r--  1 jteb  jteb   0 29 okt 20:24 _8c.tii
-rw-r--r--  1 jteb  jteb   0 29 okt 20:24 _8c.tis
-rw-r--r--  1 jteb  jteb   0 29 okt 20:28 _8c.tvf
-rw-r--r--  1 jteb  jteb   0 29 okt 20:30 _8c.tvx
-rw-r--r--  1 jteb  jteb   0 29 okt 20:24 _8d.fdt
-rw-r--r--  1 jteb  jteb   0 29 okt 20:25 _8d.fdx
-rw-r

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-10-29 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926360#action_12926360
 ] 

Jason Rutherglen commented on LUCENE-2729:
--

Post a listing of the index files with their lengths, ie, ls -la.  

> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> --
>
> Key: LUCENE-2729
> URL: https://issues.apache.org/jira/browse/LUCENE-2729
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1, 3.0.2
> Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>Reporter: Nico Krijnen
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
> at 
> org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166)
> at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
> at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
> at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
> ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - Problem 
> copying segments: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
> org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:84)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957)
> at 
> proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
> at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
> at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
> at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
> at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
> at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.j