[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935680#action_12935680 ] Jan te Beest commented on LUCENE-2729: -- OK, thanks for the info. > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, > eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, > read-past-eof-debugging.zip > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957) > at > proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.index
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935410#action_12935410 ] Xiaoyang Gu commented on LUCENE-2729: - regarding "I see a patch against RealtimeIndexDataLoader as well. Does it have anything to do with the 'load manager stopped' errors I encountered?" This class hasn't been changed since august. fatal exception during indexing would kill the indexing thread (load manager thread). After any such exception, you will see that error message. At that point, we cannot recover and it makes no sense to continue indexing. In Zoie's mbean, there's a HEALTH level value. When that value turns non-0, the system needs some serious attention. You can use JMX to monitor the system health remotely. > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, > eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, > read-past-eof-debugging.zip > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWrite
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935366#action_12935366 ] Nico Krijnen commented on LUCENE-2729: -- Thanks for all the help guys! We greatly appreciate all the insights and thorough responses. The fixes made to ZOIE by Xiaoyang Gu have indeed resolved the issue. For details see: http://snaprojects.jira.com/browse/ZOIE-51 > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, > eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, > read-past-eof-debugging.zip > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957) > at > proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoad
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935351#action_12935351 ] Jan te Beest commented on LUCENE-2729: -- Splendid, it's working like a charm now! We're using lucene+zoie as a backend to our DAM system (http://elvisready.com). Most of our clients have multiple wire-feeds coming in 24/7 and twice this issue left a corrupted index behind. Which was easily fixed by deleting the 0k segments file, but it was really scary nonetheless. I won't attach the log file as it's rather boring now, but if you want to take a look let me know and I'll post it anyway. Thanks again for your help! > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, > eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, > read-past-eof-debugging.zip > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957) > at
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935256#action_12935256 ] Jan te Beest commented on LUCENE-2729: -- First of all, many thanks for your support (and patience at it)! {quote} Are you running with assertions enabled? Maybe we'll catch something... {quote} Well I had, but somehow the option to the JVM got lost when switching between git branches. I'll turn it on again before testing with Xiaoyang's patches merged into my local git repository. Looking at the patches, I have a good feeling they will solve the issue. Xiaoyang, I see a patch against RealtimeIndexDataLoader as well. Does it have anything to do with the 'load manager stopped' errors I encountered? I have to say though that I didn't see those errors when the backup process didn't fail. {quote} Also: can you try upgrading to the latest 3.0.x tip (svn checkout https://svn.apache.org/repos/asf/lucene/java/branches/lucene_3_0)? This doesn't sound like any bug we've fixed on 3.0.x but it's worth a shot {quote} I'll do that anyway, I like being at the tip of the branch :) {quote} I see strange output for the backup, eg "Backup to file _7c.prx, dataLen: 0" and then "Backup to file _7c.prx DONE, written: 8" - why is that? (A number of files show this, but then a bunch of others look like "correct" sizes). Are you sure the backup copy of segments_5e is actually valid? {quote} We actually encounter this regularly for any files other than segments_*, it seems harmless though. As for the segments_5e being valid at that point; yes it is. bq. You should probably mark ZoieIDP.lastSnapshotSegmentsFileName as volatile. You're totally right, how could I have missed that? OK, I'll do some more tests and let you know the outcome. > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, > eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, > read-past-eof-debugging.zip > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935152#action_12935152 ] Xiaoyang Gu commented on LUCENE-2729: - taking in Nico's patch for file channel bug ZOIE-53 and now use commit: e6c53ed > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, > eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, > read-past-eof-debugging.zip > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957) > at > proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndex
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935134#action_12935134 ] Xiaoyang Gu commented on LUCENE-2729: - at g...@github.com:xiaoyang/zoie.git at commit: af19217 > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, > eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, > read-past-eof-debugging.zip > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957) > at > proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) >
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935130#action_12935130 ] Xiaoyang Gu commented on LUCENE-2729: - I already reproduced the same Exception here and the above patch at least fixes it in the sense that my code for reproducing it would stop to have Exception. I committed a patch on my master branch. Thanks to everybody working on it. > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, > eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, > read-past-eof-debugging.zip > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957) > at > proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228) > at > proj.zoie.impl.indexing.internal.LuceneIndex
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935121#action_12935121 ] Xiaoyang Gu commented on LUCENE-2729: - The "rw" mode shouldn't explain anything, since we don't write anything, it doesn't affect the state of the file at least for MAC OS. But the missed deletion policy could cause deletion of files that shouldn't be deleted and then the "rw" mode could create some 0 length files. > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, > eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, > read-past-eof-debugging.zip > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957) > at > proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228) > at >
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935118#action_12935118 ] Xiaoyang Gu commented on LUCENE-2729: - Yes, I was just looking at that part. Can you test with the following patched method in DiskSearchIndex class? @Override protected IndexReader openIndexReaderForDelete() throws IOException { Directory directory = _dirMgr.getDirectory(true); if (IndexReader.indexExists(directory)){ return IndexReader.open(directory, _deletionPolicy, false); } else{ return null; } } > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, > eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, > read-past-eof-debugging.zip > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957) > at > proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935082#action_12935082 ] Michael McCandless commented on LUCENE-2729: bq. We accidentally opened the file in "rw" mode. OK, phew :) So that explains how they became 0 length. But: can you also check where Zoie opens the IndexReader to apply deletions? I suspect that IndexReader is not being provided the snapshot deletion policy... > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, > eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, > read-past-eof-debugging.zip > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957) > at > proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228) > at
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935025#action_12935025 ] Xiaoyang Gu commented on LUCENE-2729: - Here's the code in zoie that exports index in class DefaultDirectoryManager: We accidentally opened the file in "rw" mode. But there's no write operations to the file. Can you test whether changing it to "r" mode makes any difference? Thanks! public long transferFromFileToChannel(String fileName, WritableByteChannel channel) throws IOException { long amount = 0; File file = new File(_location, fileName); RandomAccessFile raf = null; FileChannel fc = null; try { raf = new RandomAccessFile(file, "rw"); fc = raf.getChannel(); long dataLen = fc.size(); amount += ChannelUtil.writeLong(channel, dataLen); amount += fc.transferTo(0, dataLen, channel); } finally { try { if (fc != null) fc.close(); } finally { if (raf != null) raf.close(); } } return amount; } > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, > eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, > read-past-eof-debugging.zip > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934913#action_12934913 ] Michael McCandless commented on LUCENE-2729: On your workaround, I think that should work -- you're having IFD disregard zero length segments files. Are you running with assertions enabled? Maybe we'll catch something... I really don't like that "refcount is now -1" -- such a thing should never happen to a refCount! But looking at ZoieIDP's sources I think that's harmless. It has a finalizer that first sets refCount to 0 and then calls close... So the backup completes successfully, after which the segments_5e exists as a (correct) non-zero file. Then the snapshot is released, and at that point, instead of the file being deleted (as we'd expect), something is somehow making it 0 bytes. Can you try modifying FSDir.fsync? First, check if they file about-to-be-sync'd in fact exists (throw exception or warn if not); second, if we are syncing a segments_N file, print its name & full stack trace? I wonder if somehow we are syncing these files after they got deleted... Also: can you try upgrading to the latest 3.0.x tip (svn checkout https://svn.apache.org/repos/asf/lucene/java/branches/lucene_3_0)? This doesn't sound like any bug we've fixed on 3.0.x but it's worth a shot :) I see strange output for the backup, eg "Backup to file _7c.prx, dataLen: 0" and then "Backup to file _7c.prx DONE, written: 8" -- why is that? (A number of files show this, but then a bunch of others look like "correct" sizes). Are you sure the backup copy of segments_5e is actually valid? You should probably mark ZoieIDP.lastSnapshotSegmentsFileName as volatile. OK digging through the log... it looks like IW is closed, committing segments_5f and protecting segments_5e from deletion. Then, and IndexReader is opened (to do deletes) and it looks like that deletes segments_5e because the next IW that's opened no longer sees segments_5e (despite that the backup process is still going). Can you double check that when Zoie opens an IndexReader for deletions it's passing the ZoieIDP instance that's protecting segments_5e? That's a serious problem, but, it cannot explain the zero length files... unless: does Zoie open a read/write file (against the file it needs to copy) when it does its backups? > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, > eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, > read-past-eof-debugging.zip > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDelet
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934857#action_12934857 ] Jan te Beest commented on LUCENE-2729: -- Hi Michael, I'm one of Nico's colleagues and I've taken over where he left off. After a lot of testing and debugging, I have found a workaround to the issue and I've collected more logging. Hopefully this will help to find a real fix (be it in Zoie of Lucene) instead of the dirty hack I've put in the code right now. Attached is a zip-file with a full log (eof-logging-9.log) the same log stripped down to a period where the backup is happening, i.e. between creating the Snapshot and the moment where the 'read past EOF' is happening (stripped-log.log). The other to files are a svn-diff against lucene and a git patch against zoie. First let me try to explain what I think is happening. {noformat}11:50:07,449 [http-8080-exec-4] DEBUG proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy - Init Snapshot segmentsFileName=segments_5e{noformat} Snapshot is initialized with IndexCommit(segments_5e) It is not really apparent from this log as I had to restart the server due to those "load manager has stopped" ZoieExceptions, but the behaviour is the same as in all other test runs I did. All files are properly written to the .snapshot file, including the segments_5e file. After the backup completes, the Snapshot is dereferenced. {noformat} 12:43:06,988 [Finalizer] DEBUG proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy - Snapshot(IndexFileDeleter.CommitPoint(segments_5e)).refCount=0 java.lang.Exception: 12:43:06.988[Finalizer] at com.ds.util.log.TraceUtil.stackTrace(TraceUtil.java:20) at proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy$Snapshot.decRef(ZoieIndexDeletionPolicy.java:139) at proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy$Snapshot.close(ZoieIndexDeletionPolicy.java:123) at proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy$Snapshot.finalize(ZoieIndexDeletionPolicy.java:148) at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method) at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:83) at java.lang.ref.Finalizer.access$100(Finalizer.java:14) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:160) 12:43:06,991 [Finalizer] DEBUG proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy - Snapshot(IndexFileDeleter.CommitPoint(segments_5e)), refcount is now -1 {noformat} Now something is making the segments_5e file 0 bytes, I've not been able to figure out why this is happening though. I'll try to get some proper logging about this. Where the 'read past EOF' happens is quite clear at this point however. When a new IndexFileDeleter is initialized and the directory is scanned for segments_* files, i.e. {noformat} if (fileName.startsWith(IndexFileNames.SEGMENTS)) { // This is a commit (segments or segments_N), and // it's valid (<= the max gen). Load it, then // incref all files it refers to: {noformat} This piece of code yields the EOF error, which is obvious as the segments_ file is 0 bytes. {noformat} SegmentInfos sis = new SegmentInfos(); try { sis.read(directory, fileName); } catch (FileNotFoundException e) { {noformat} I added another check to the {noformat}if (fileName.startsWith(IndexFileNames.SEGMENTS)){noformat}. However that's not solving the problem, rather working around it. {noformat} long fileLength = directory.fileLength(fileName); if (fileName.startsWith(IndexFileNames.SEGMENTS) && fileLength <= 0) { message("init: file "+fileName+" is invalid (has no length), skipping further handling"); } if (fileName.startsWith(IndexFileNames.SEGMENTS) && fileLength > 0) { // This is a commit (segments or segments_N), and // it's valid (<= the max gen). Load it, then // incref all files it refers to: {noformat} We don't see any more errors with this hack (in fact I was able to do 10 backups of a 21GB index in a row). The segments_ files are deleted from the index directory after a while as well, so it's not like they have a positive refcount anywhere else (if I understand correctly). So, somewhere in between the Snapshot being closed and a new IndexFileDeleter initialized the segments_* something is writing 0 bytes to that segments_* file. Like said before, I have not been able to find the place where this happens. As it is only happening under heavy load (I had 25 threads reading and writing the index besides backing up the index to the snapshot file), I'm guessing it may be some synchronization issue? Perhaps you get an idea now, if you want me to put in more logging or do something else to clear it up, just give me a hint. Che
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932719#action_12932719 ] Michael McCandless commented on LUCENE-2729: Can you print the full stack trace when files are opened for writing? Ie we need to know who wrote the zero length segments_2q. Also, can you post "svn diff" output against Lucene's sources? This would help me understand the log file. Did you instrument each file-copy being done by the backup process? And, when the backup process starts. On the next run, can you enable the static infoStream in SegmentInfos.java? This will print details about how it's finding the segments file. Also, can you not turn on infoStream for IndexWriters against the RAMDirs? This should cut down drastically on the log file. Lucene normally should never re-use a segments file -- the geneation only ever increase on each commit to the index. Yes, SegmentInfos.read sets the generation in SegmentInfos, but this is only called once, upon creating a new SegmentInfos(). It should never be overwritten later by Lucene calling read again. So long as the deletion policy is keeping that commit point alive (segments_2q), Lucene should not be deleting it. I sometimes see "writing ", eg: {noformat} 13:27:02,554 [http-8080-2] TRACE com.ds.acm.engine.search.zoieimpl.core.DirectoryManager - Directory[lockID=lucene-23f3e69abbd639b3b4a1db928c1b323d, filename=assetIndex] writing _5t.tvd, directory contents: _2q.fdt(size=969134416, modified=16:41:57,000) {noformat} and other times "creating output": {noformat} 13:26:51,732 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@39f46204] DEBUG org.apache.lucene.store.SimpleFSDirectory - Creating output for _66.tvx {noformat} Why the difference? Like where in the code are you printing these? > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, > eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.Ba
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931081#action_12931081 ] Michael McCandless commented on LUCENE-2729: bq. I'll see if we can also add some logging to detect (including stack-trace) who opened a file but never closes it.. Actually you can use the MockDirectoryWrapper in Lucene's unit tests for this -- it tracks open inputs & outputs and then throws an exception on close if anything is still open. You should modify it to print all open inputs/outputs instead of throwing an exception, for this usage. > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > backup_force_failure2.log.zip, LUCENE-2729-test1.patch > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957) > at > proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIn
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931049#action_12931049 ] Nico Krijnen commented on LUCENE-2729: -- About 'segments_2j' occuring multiple times. I forgot to mention that we have two indexes operating at the same time... one for assets, one for relations. So adding the directory + RAM/disk index in the logging sounds like a very good plan indeed. About the 0-bytes on close detection logs, seems we log this a few lines too early (before super.close is called instead of after) - whoops... will correct this and run again. {quote} It looks like this 0 length files are being opened (by something) but then never closed. {quote} I'll see if we can also add some logging to detect (including stack-trace) who opened a file but never closes it... {quote} EG are you sure you're running the JRE with assertions enabled? {quote} Pretty sure it is not, we'll turn assertions on. {quote} when you print out the full file listing, in addition to the size of each file, also print out the .fileModified result? {quote} Will do. > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > backup_force_failure2.log.zip, LUCENE-2729-test1.patch > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931044#action_12931044 ] Michael McCandless commented on LUCENE-2729: When you print the TRACE output saying that a file is being written, can your also print which dir instance we are "in"? I can can see that suddenly segments_2j is opened for writing at 11:03:27,653 but it's not clear which directory that is. Maybe print the threads stack trace at that point? It looks like this 0 length files are being opened (by something) but then never closed. Also: I see lines with "Writing _2q.fdx" where the directory contents on that line (and lines above it) show _2q.fdx clearly exists. Yet, you had added an assert to check this very case but the assert never tripped? So something is wrong. EG are you sure you're running the JRE with assertions enabled? Maybe when you print out the full file listing, in addition to the size of each file, also print out the .fileModified result? This can help us confirm that files are getting overwritten. > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > backup_force_failure2.log.zip, LUCENE-2729-test1.patch > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtain
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931034#action_12931034 ] Michael McCandless commented on LUCENE-2729: Also, I like that you detect 0-bytes on close! However, I'm not sure it's working right. EG line 1,017,585 has this: {noformat} java.lang.Exception: 11:02:18,639 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@7636d59a] FOUND ONE! closing 0kb IndexOutput, file: _65.fnm, file.length():0, bytesWritten: 0 {noformat} But then shortly thereafter (line 1,017,604) I see that _65.fnm is 1749 bytes long. Maybe the code you added for this warning is failing to take into account the buffered bytes? Ie, instead of calling file.length() you should call IndexOutput.getFilePointer()? Hmm, but if you call file.length() after the file really is closed... then that ought to be correct. > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > backup_force_failure2.log.zip, LUCENE-2729-test1.patch > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lu
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931031#action_12931031 ] Michael McCandless commented on LUCENE-2729: I'll look at the new log, but, quickly one correction: Zoie writes first into a RAMDir, and then periodically uses addIndexesNoOptimize to fold that RAM index into the "real" one. So you have to be careful when you "grep" because for segments_2j, all of the 'delete "segments_2j"' look like they are in the RAMDir (the thread is ConsumerThread), and all but one of the 'wrote segments file "segments_2j"' are also to the RAMDir. > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > backup_force_failure2.log.zip, LUCENE-2729-test1.patch > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957) > at > proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176) > at > proj.zoie.impl.indexing.interna
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931024#action_12931024 ] Nico Krijnen commented on LUCENE-2729: -- We finished another run today and got some interesting results with from the additional logging. The log times are searchable in the log file... - 09:40:21,996: we are creating a snapshot for our backup, segments filename used for snapshot: 'segments_2j' On the next line we can see that at that moment the segments_2j is 2353 bytes. We log the filenames in the index + the (filesizes). - 11:03:27,653: segments_2j is written to disk _How can this be while this is the name of the segments filename used to create our Snapshot?_ On the next line we see that it has indeed been written to disk, but has a 0 byte file size. - 11:08:02,731: First "read past EOF" exception _It looks like segments filenames are reused and at a point in time well after our Snapshot is created, the name 'segments_2j' is reused? This somehow causes the segments file to get a size of 0 bytes. Perhaps it is being deleted and written at the same time, or because of the name clash there's no data to be written to the segments file?_ - 11:16:03,300: our backup is finished. Here is an excerpt of some of the more interesting log entries: {quote} 09:39:30,719 [http-8080-2] DEBUG com.ds.acm.logic.tools.BackupManager - Start asset index backup 09:40:21,996 [http-8080-2] DEBUG proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy - Ctor Snapshot. segments filename: *segments_2j*, commit filenames: [_5f.tvd, _4c.tvf, _5f.tvf, _4d.fdx, _4p.tis, _4c.tvd, _4p.frq, _5e.tvx, _39.tis, _3u.frq, _4d.fdt, _4p.tii, _5f.fnm, _4c.tvx, _39.tii, _55.tis, _2q.fnm, _4p.fdx, _2q.tvf, _5e.tvf, _5e.tvd, _2q.tvd, _5f.tvx, _4p.fdt, _55.tii, _55.nrm, _55.frq, _2q.tvx, _3u.tvx, _5f.fdx, _4c.frq, _5f.fdt, _4c.fdx, _2q.nrm, _4c.fdt, _4p.tvf, _4c.nrm, _3v.tvf, _4c.tii, _3u.prx, _4p.tvd, _4c.tis, _3u.fdt, _5e.fnm, _3v.tvd, _3u.fdx, _55.tvx, _39.fdt, _39.fnm, _5e.prx, _3u.tis, *segments_2j*, _3v.prx, _2q.fdx, _4c.prx, _4d.prx, _4p.tvx, _39.fdx, _3u.tii, _2q.fdt, _4d.frq, _55.fdt, _2q.prx, _5e.tii, _3u.fnm, _55.fdx, _5e.nrm, _5f.tii, _5e.tis, _4d.nrm, _4d.tii, _5e.fdt, _4d.tis, _5f.frq, _5e.fdx, _39.tvd, _39.tvf, _55.fnm, _3v.frq, _2q.frq, _2q.tii, _3v.tii, _5f.prx, _4p.nrm, _5f.nrm, _2q.tis, _3v.tvx, _4c.fnm, _3v.nrm, _55.tvf, _55.tvd, _3u.nrm, _5f.tis, _3v.tis, _3v.fdt, _4d.fnm, _4d.tvf, _3v.fdx, _4d.tvd, _4d.tvx, _4p.prx, _3v.fnm, _5e.frq, _39.prx, _39.tvx, _39.frq, _4p.fnm, _39.nrm, _3u.tvf, _55.prx, _3u.tvd] 11:03:27,653 [http-8080-2] TRACE com.ds.acm.engine.search.zoieimpl.core.DirectoryManager - Writing *segments_2j*, directory contents: _2q.fdt(969134416), _2q.fdx(36652), _2q.fnm(276), _2q.frq(4685726), _2q.nrm(9166), _2q.prx(393230403), _2q.tii(7447), _2q.tis(746299), _2q.tvd(8394), _2q.tvf(599185081), _2q.tvx(73300), _39.fdt(2061261675), _39.fdx(1012), _39.fnm(276), _39.frq(17754579), _39.nrm(256), _39.prx(121067407), _39.tii(137511), _39.tis(11726653), _39.tvd(185), _39.tvf(233037042), _39.tvx(2020), _3u.fdt(1595882722), _3u.fdx(63692), _3u.fnm(330), _3u.frq(8001869), _3u.nrm(15926), _3u.prx(647374863), _3u.tii(11319), _3u.tis(1168399), _3u.tvd(14209), _3u.tvf(986370136), _3u.tvx(127380), _3v.fdt(2057147455), _3v.fdx(476), _3v.fnm(384), _3v.frq(1520), _3v.nrm(122), _3v.prx(109724024), _3v.tii(132491), _3v.tis(11457688), _3v.tvd(114), _3v.tvf(211902147), _3v.tvx(948), _4c.fdt(2691565961), _4c.fdx(39572), _4c.fnm(276), _4c.frq(18724620), _4c.nrm(9896), _4c.prx(590255960), _4c.tii(141243), _4c.tis(12185869), _4c.tvd(9894), _4c.tvf(932649779), _4c.tvx(79140), _4d.fdt(2398908136), _4d.fdx(548), _4d.fnm(354), _4d.frq(24581614), _4d.nrm(140), _4d.prx(158243133), _4d.tii(141948), _4d.tis(12259425), _4d.tvd(140), _4d.tvf(303769970), _4d.tvx(1092), _4p.fdt(1081212027), _4p.fdx(212), _4p.fnm(354), _4p.frq(8294102), _4p.nrm(56), _4p.prx(60513257), _4p.tii(134898), _4p.tis(11376730), _4p.tvd(56), _4p.tvf(116715012), _4p.tvx(420), _55.fdt(787581180), _55.fdx(156), _55.fnm(354), _55.frq(5376741), _55.nrm(42), _55.prx(39539985), _55.tii(133483), _55.tis(11351989), _55.tvd(42), _55.tvf(75958725), _55.tvx(308), _5e.prx(0), _5z.fdt(853614415), _5z.fdx(23700), _5z.fnm(1696), _5z.frq(9041984), _5z.nrm(5928), _5z.prx(32617823), _5z.tii(146115), _5z.tis(12387143), _5z.tvd(4142), _5z.tvf(74409815), _5z.tvx(47396), _63.fdt(212690811), _63.fdx(8188), _63.fnm(1652), _63.frq(2492889), _63.nrm(2050), _63.prx(5672093), _63.tii(8470), _63.tis(689993), _63.tvd(1442), _63.tvf(15371474), _63.tvx(16372), _64.fdt(4118409126), _64.fdx(1484), _64.fnm(384), _64.frq(35294399), _64.nrm(374), _64.prx(230791431), _64.tii(143860), _64.tis(12491845), _64.tvd(295), _64.tvf(444939185), _64.tvx(2964), _65.fdt(1369489408), _65.fdx(0), _65.fnm(1749), index.directory(5), segments.gen(20), segments
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930579#action_12930579 ] Michael McCandless commented on LUCENE-2729: bq. Will do, but we also got the error while zoie was opening a new IndexWriter: You're right... the next attempt to open an IndexWriter, or and IndexReader that's trying to commit pending deletes, will hit the exc due to the zero-length segments file. So in fact this already helps us further bracket when the truncation happened, because IW 115 was able to successfully open on the assetIndex, but then the attempt to write deletions hits the exc, as well as the next attempt to open a writer on the index. So sometime after IW 115 is opened and before the deletions are applied, something truncates these index files. bq. My guess is that the 'read past EOF' is not really specific to applying deletes, but just happens when the SegementInfos is loaded on a 0kb file. That's right. What's really odd is that these 0-length files are very old, ie, indexing has long ago moved on to newer segments. I suspect they are in fact the files protected by the snapshot for backups... so I think it's important we instrument Zoie's backup process next. {quote} We ran a test with your patch to throw a RuntimeException when an output already exists. We did get a 'read past EOF', but the additional RuntimeException is never thrown. {quote} H. So it sounds like it's not Lucene's SimpleFSDir.createOutput that's truncating these files. > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > LUCENE-2729-test1.patch > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930186#action_12930186 ] Nico Krijnen commented on LUCENE-2729: -- We really appreciate the help and understand that this is not the only thing you are working on ;-) Collecting post-it notes sounds so familliar :) We ran a test with your patch to throw a RuntimeException when an output already exists. We did get a 'read past EOF', but the additional RuntimeException is never thrown. We'll add the other log points and do another test run with those. If you have more suggestions for logging, let us know, we won't start the next run until tomorrow anyway... > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > LUCENE-2729-test1.patch > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957) > at > proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176) > at > proj.zoie.impl.indexing
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930142#action_12930142 ] Nico Krijnen commented on LUCENE-2729: -- {quote} ...spooky "other" exceptions... {quote} These are all 'WARN' level and all of them caused by non-critical timeouts in our code. All caused by the system being under very heavy load needed to reproduce the bug. {quote} Would it be possible to instrument to Zoie code to note as the backup process is copying each file in the snapshot, and at that point print a listing of the directory? {quote} Will do, that is a good one. Then we know which files are being 'held' by the Zoie deletion policy for the backup. {quote} Also, can you write to the log when Zoie applies deletes? (Looks like it happens in proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs). It's on applying deletes that the corruption is first detected, so, if we log this event we can better bracket the period of time when the corruption happened. {quote} Will do, but we also got the error while zoie was opening a new IndexWriter: {code} 15:25:03,453 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@3d9e7719] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - Problem copying segments: read past EOF java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) at org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:170) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1127) at org.apache.lucene.index.IndexWriter.(IndexWriter.java:960) at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176) at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228) at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:172) at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:377) {code} I'll add a log there too. My guess is that the 'read past EOF' is not really specific to applying deletes, but just happens when the SegementInfos is loaded on a 0kb file. {quote} Does Zoie ever open an IndexReader or IndexWriter passing in an existing commit point? Or does it always open the latest commit? {quote} I'll try to find out. {quote} The timestamps on the zero length files are particularly spooky - the earliest ones are 15:21 (when first EOF is hit), but then also 15:47 and 15:49 on the others. It seems like on 3 separate occasions something truncated the files. {quote} Indeed, I thought this was weird too. > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > LUCENE-2729-test1.patch > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details.
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930130#action_12930130 ] Michael McCandless commented on LUCENE-2729: Thank you for attaching the IW infoStream output! Sorry it took so long for me to respond. Aside: it is sad but there is no master TODO list in open source. It all comes down to our own email inboxes, todo lists, post-it notes all over the place, etc., and (in my case anyway) things sometimes fall past the event horizon. So please if I don't respond in a day or two on an active issue, bump it again (put a comment on the issue)! I'd much rather people over-nag than under-nag but unfortunately under-nag is far far more common and it causes important issues to languish unnecessarily. OK back to the issue :) I looked through the infoStream but I don't see a smoking gun. Ie, the logs indicate that nowhere did Lucene try to delete/overwrite those zero-length files; I see other files being deleted, so, this is what I'd expect given that ZoieDeletionPolicy is presumably protecting the segments_3t commit point (to back up its files). I do see some spooky "other" exceptions, though... these are the first 2 exceptions I see in the log: {noformat} 14:27:41,290 [bigIndexBuilder_QueueProcessor_3] WARN com.ds.acm.logic.impl.AssetManagerImpl - Ignoring AssetNotFoundException trying to make sure all metadata from index is loaded before updating an existing asset Exception in thread "pool-5-thread-6" java.lang.NullPointerException at org.apache.coyote.http11.InternalNioOutputBuffer.writeToSocket(InternalNioOutputBuffer.java:430) at org.apache.coyote.http11.InternalNioOutputBuffer.flushBuffer(InternalNioOutputBuffer.java:784) at org.apache.coyote.http11.InternalNioOutputBuffer.flush(InternalNioOutputBuffer.java:300) at org.apache.coyote.http11.Http11NioProcessor.action(Http11NioProcessor.java:1060) at org.apache.coyote.Response.action(Response.java:183) at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:314) at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:288) at org.apache.catalina.connector.Response.flushBuffer(Response.java:548) at org.apache.catalina.connector.ResponseFacade.flushBuffer(ResponseFacade.java:279) at org.granite.gravity.AbstractChannel.runReceived(AbstractChannel.java:251) at org.granite.gravity.AbstractChannel.runReceive(AbstractChannel.java:199) at org.granite.gravity.AsyncReceiver.doRun(AsyncReceiver.java:34) at org.granite.gravity.AsyncChannelRunner.run(AsyncChannelRunner.java:52) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) {noformat} and {noformat} 14:40:18,382 [Low Memory Detector] WARN com.ds.acm.engine.search.zoieimpl.core.ZoieSystemManager - Ignoring timeout while attempting to flush zoie memory index to disk to free memory proj.zoie.api.ZoieException: sync timed out at proj.zoie.impl.indexing.AsyncDataConsumer.syncWthVersion(AsyncDataConsumer.java:177) at proj.zoie.impl.indexing.AsyncDataConsumer.flushEvents(AsyncDataConsumer.java:155) at proj.zoie.impl.indexing.ZoieSystem.flushEvents(ZoieSystem.java:308) at com.ds.acm.engine.search.zoieimpl.core.ZoieSystemManager.onLowMemory(ZoieSystemManager.java:220) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.ds.util.event.BasicEventBroadcaster$Handler.invokeMethod(BasicEventBroadcaster.java:197) at com.ds.util.event.BasicEventBroadcaster$Handler.handle(BasicEventBroadcaster.java:190) at com.ds.util.event.BasicEventBroadcaster.fire(BasicEventBroadcaster.java:108) at com.ds.util.cache.LowMemoryWarningBroadcaster$1.handleNotification(LowMemoryWarningBroadcaster.java:135) at sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138) at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171) at sun.management.MemoryPoolImpl$PoolSensor.triggerAction(MemoryPoolImpl.java:272) at sun.management.Sensor.trigger(Sensor.java:120) {noformat} That 2nd exception happens a total of 9 times... and is rather spooky. What does it mean? Ie, why is Zoie timing out on flushing the index to disk, and, what does it then do w/ its RAMDir? I also see alot of these: {noformat} 15:50:18,856 [bigIndexBuilder_QueueProcessor_10] WARN
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930094#action_12930094 ] Nico Krijnen commented on LUCENE-2729: -- Thx! We will update, patch and re-run the test. > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > LUCENE-2729-test1.patch > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957) > at > proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.in
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930092#action_12930092 ] Simon Willnauer commented on LUCENE-2729: - ah I think I have been on the wrong branch nevermind! > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > LUCENE-2729-test1.patch > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957) > at > proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930091#action_12930091 ] Simon Willnauer commented on LUCENE-2729: - bq. at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) I just had a quick look at it and I wonder what revision you use for that? SegmentsInfos.java does not contain a read call nice [revision 892992 | https://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/index/SegmentInfos.java?revision=892992&view=markup] can you clarify? > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip, > LUCENE-2729-test1.patch > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957) > at > proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228) > at > proj.zoie.impl.indexing.inte
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930086#action_12930086 ] Nico Krijnen commented on LUCENE-2729: -- Any ideas on what could be happening? It sounds like IndexWriter is the only one that is modifying these files, zoie only seems to be reading from them to make the backup. What should we look for in the IndexWriter's infoStream? > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957) > at > proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929702#action_12929702 ] John Wang commented on LUCENE-2729: --- zoie does not touch index files, only adds an index.directory file containing version information. > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > Attachments: 2010-11-02 IndexWriter infoStream log.zip > > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957) > at > proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoi
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927043#action_12927043 ] Michael McCandless commented on LUCENE-2729: Somehow we have to locate the event that causes the truncation of the files. Can you enable IndexWriter's infoStream and then get the corruption to happen, and post the results? > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957) > at > proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927040#action_12927040 ] Nico Krijnen commented on LUCENE-2729: -- In the mean time, we also did a test with a checkout of the latest lucene_3_0 branch (@2010-11-01), which should include the fix that Jason mentions. Does not seem to make a difference though. We still get a 'read past EOF'. On the last run we did get a slightly different stacktrace. This time the 'read past EOF' happens when the zoie RAM index is written to the zoie Disk index. Last time it occurred a little earlier in BaseSearchIndex#loadFromIndex, while committing deletes to the disk IndexReader. This could be just a coincidence though. My feeling is still that the 'read past EOF' is just a result/symptom of something else that happened just before it - still trying to figure out what that could be... {code} 15:25:03,453 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@3d9e7719] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - Problem copying segments: read past EOF java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) at org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:170) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1127) at org.apache.lucene.index.IndexWriter.(IndexWriter.java:960) at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176) at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228) at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:172) at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:377) {code} > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927031#action_12927031 ] Jason Rutherglen commented on LUCENE-2729: -- Using Solr 1.4.2 on disk full .del files were being written with a file length of zero, however that is supposed to be fixed by https://issues.apache.org/jira/browse/LUCENE-2593 This doesn't appear to be similar because more than the .del files are of zero length. > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - > Problem copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957) > at > proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoi
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926908#action_12926908 ] Nico Krijnen commented on LUCENE-2729: -- bq. Was there no original root cause here? Eg disk full? This was one of the first things i thought, but the disk has more than enough free space: 200GB. Also, for this test we write the backup to a different disk - both for better performance and to prevent the disk with the index on it from running out of free space. bq. Or is the "read past EOF" on closing an IndexReader w/ pending deletes really the first exception you see? It is the first exception we see. We turned on quite a bit of additional logging but we have not been able to find out anything weird happening before this error. I do expect something weird must have happened to cause the 'read past EOF'. Do you have any clues as to what we could look for? - that might narrow the search. We are able to consistently reproduce this on our test environment. So if you have clues to specific debug logging that should be turned on - we can do another test run. bq. Does zoie somehow touch the index files? We'll try to find out. For as far as I see the basic backup procedure is to grab the last 'commit snapshot', prevent it from being deleted (ZoieIndexDeletionPolicy), and write all the files mentioned in the commit snapshot to a NIO WritableByteChannel. > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926907#action_12926907 ] Michael McCandless commented on LUCENE-2729: That long string of length 0 files is very bizarre. Was there no original root cause here? Eg disk full? Or is the "read past EOF" on closing an IndexReader w/ pending deletes really the first exception you see? Does zoie somehow touch the index files? Taking a backup is fundamentally a read-only op on the index, so that process shouldn't by itself truncate index files. Something is somehow reaching in and zero-ing out these files. I don't think Lucene itself would do this. For example, the serious of _6i.XXX zero'd files... Lucene writes these files roughly in sequence, so if something bad happened in writing the postings, then the .nrm file should not even exist. So we need to figure out who is truncating these files... > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - Problem > copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060) > at org.apache.lucene.index.IndexWriter.(I
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926859#action_12926859 ] Nico Krijnen commented on LUCENE-2729: -- A second file listing from another test run, same result: read past EOF {code} jteb:assetIndex jteb$ ls -la total 38739848 drwxr-xr-x 2 jteb jteb4964 26 okt 11:51 . drwxr-xr-x 3 jteb jteb 204 22 okt 11:42 .. -rw-r--r-- 1 jteb jteb 969134416 18 okt 16:41 _2q.fdt -rw-r--r-- 1 jteb jteb 36652 18 okt 16:41 _2q.fdx -rw-r--r-- 1 jteb jteb 276 18 okt 16:41 _2q.fnm -rw-r--r-- 1 jteb jteb 4685726 18 okt 16:41 _2q.frq -rw-r--r-- 1 jteb jteb9166 18 okt 16:41 _2q.nrm -rw-r--r-- 1 jteb jteb 393230403 18 okt 16:42 _2q.prx -rw-r--r-- 1 jteb jteb7447 18 okt 16:42 _2q.tii -rw-r--r-- 1 jteb jteb 746299 18 okt 16:42 _2q.tis -rw-r--r-- 1 jteb jteb8394 18 okt 16:42 _2q.tvd -rw-r--r-- 1 jteb jteb 599185081 18 okt 16:42 _2q.tvf -rw-r--r-- 1 jteb jteb 73300 18 okt 16:42 _2q.tvx -rw-r--r-- 1 jteb jteb 2061261675 18 okt 16:44 _39.fdt -rw-r--r-- 1 jteb jteb1012 18 okt 16:44 _39.fdx -rw-r--r-- 1 jteb jteb 276 18 okt 16:44 _39.fnm -rw-r--r-- 1 jteb jteb17754579 18 okt 16:44 _39.frq -rw-r--r-- 1 jteb jteb 256 18 okt 16:44 _39.nrm -rw-r--r-- 1 jteb jteb 121067407 18 okt 16:44 _39.prx -rw-r--r-- 1 jteb jteb 137511 18 okt 16:44 _39.tii -rw-r--r-- 1 jteb jteb11726653 18 okt 16:44 _39.tis -rw-r--r-- 1 jteb jteb 185 18 okt 16:44 _39.tvd -rw-r--r-- 1 jteb jteb 233037042 18 okt 16:44 _39.tvf -rw-r--r-- 1 jteb jteb2020 18 okt 16:44 _39.tvx -rw-r--r-- 1 jteb jteb 1595882722 18 okt 16:45 _3u.fdt -rw-r--r-- 1 jteb jteb 63692 18 okt 16:45 _3u.fdx -rw-r--r-- 1 jteb jteb 330 18 okt 16:45 _3u.fnm -rw-r--r-- 1 jteb jteb 8001869 18 okt 16:45 _3u.frq -rw-r--r-- 1 jteb jteb 15926 18 okt 16:45 _3u.nrm -rw-r--r-- 1 jteb jteb 647374863 18 okt 16:45 _3u.prx -rw-r--r-- 1 jteb jteb 11319 18 okt 16:45 _3u.tii -rw-r--r-- 1 jteb jteb 1168399 18 okt 16:45 _3u.tis -rw-r--r-- 1 jteb jteb 14209 18 okt 16:45 _3u.tvd -rw-r--r-- 1 jteb jteb 986370136 18 okt 16:46 _3u.tvf -rw-r--r-- 1 jteb jteb 127380 18 okt 16:46 _3u.tvx -rw-r--r-- 1 jteb jteb 2057147455 18 okt 16:47 _3v.fdt -rw-r--r-- 1 jteb jteb 476 18 okt 16:47 _3v.fdx -rw-r--r-- 1 jteb jteb 384 18 okt 16:47 _3v.fnm -rw-r--r-- 1 jteb jteb1520 18 okt 16:47 _3v.frq -rw-r--r-- 1 jteb jteb 122 18 okt 16:47 _3v.nrm -rw-r--r-- 1 jteb jteb 109724024 18 okt 16:47 _3v.prx -rw-r--r-- 1 jteb jteb 132491 18 okt 16:47 _3v.tii -rw-r--r-- 1 jteb jteb11457688 18 okt 16:47 _3v.tis -rw-r--r-- 1 jteb jteb 114 18 okt 16:47 _3v.tvd -rw-r--r-- 1 jteb jteb 211902147 18 okt 16:48 _3v.tvf -rw-r--r-- 1 jteb jteb 948 18 okt 16:48 _3v.tvx -rw-r--r-- 1 jteb jteb 2691565961 18 okt 16:49 _4c.fdt -rw-r--r-- 1 jteb jteb 39572 18 okt 16:49 _4c.fdx -rw-r--r-- 1 jteb jteb 276 18 okt 16:49 _4c.fnm -rw-r--r-- 1 jteb jteb18724620 18 okt 16:49 _4c.frq -rw-r--r-- 1 jteb jteb9896 18 okt 16:49 _4c.nrm -rw-r--r-- 1 jteb jteb 590255960 18 okt 16:50 _4c.prx -rw-r--r-- 1 jteb jteb 141243 18 okt 16:50 _4c.tii -rw-r--r-- 1 jteb jteb12185869 18 okt 16:50 _4c.tis -rw-r--r-- 1 jteb jteb9894 18 okt 16:50 _4c.tvd -rw-r--r-- 1 jteb jteb 932649779 18 okt 16:51 _4c.tvf -rw-r--r-- 1 jteb jteb 79140 18 okt 16:51 _4c.tvx -rw-r--r-- 1 jteb jteb 2398908136 18 okt 16:52 _4d.fdt -rw-r--r-- 1 jteb jteb 548 18 okt 16:52 _4d.fdx -rw-r--r-- 1 jteb jteb 354 18 okt 16:52 _4d.fnm -rw-r--r-- 1 jteb jteb24581614 18 okt 16:52 _4d.frq -rw-r--r-- 1 jteb jteb 140 18 okt 16:52 _4d.nrm -rw-r--r-- 1 jteb jteb 158243133 18 okt 16:52 _4d.prx -rw-r--r-- 1 jteb jteb 141948 18 okt 16:52 _4d.tii -rw-r--r-- 1 jteb jteb12259425 18 okt 16:52 _4d.tis -rw-r--r-- 1 jteb jteb 140 18 okt 16:52 _4d.tvd -rw-r--r-- 1 jteb jteb 303769970 18 okt 16:53 _4d.tvf -rw-r--r-- 1 jteb jteb1092 18 okt 16:53 _4d.tvx -rw-r--r-- 1 jteb jteb 1081212027 18 okt 16:53 _4p.fdt -rw-r--r-- 1 jteb jteb 212 18 okt 16:53 _4p.fdx -rw-r--r-- 1 jteb jteb 354 18 okt 16:53 _4p.fnm -rw-r--r-- 1 jteb jteb 8294102 18 okt 16:53 _4p.frq -rw-r--r-- 1 jteb jteb 56 18 okt 16:53 _4p.nrm -rw-r--r-- 1 jteb jteb60513257 18 okt 16:53 _4p.prx -rw-r--r-- 1 jteb jteb 134898 18 okt 16:53 _4p.tii -rw-r--r-- 1 jteb jteb11376730 18 okt 16:53 _4p.tis -rw-r--r-- 1 jteb jteb 56 18 okt 16:53 _4p.tvd -rw-r--r-- 1 jteb jteb 116715012 18 okt 16:53 _4p.tvf -rw-r--r-- 1 jteb jteb 420 18 okt 16:53 _4p.tvx -rw-r--r-- 1 jteb jteb 787581180 18 okt 16:54
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926858#action_12926858 ] Nico Krijnen commented on LUCENE-2729: -- {code} jteb:assetIndex jteb$ ls -la total 41550832 drwxr-xr-x 2 jteb jteb4862 1 nov 08:52 . drwxr-xr-x 4 jteb jteb 238 29 okt 14:10 .. -rw-r--r--@ 1 jteb jteb 21508 1 nov 08:52 .DS_Store -rw-r--r-- 1 jteb jteb 969134416 18 okt 16:41 _2q.fdt -rw-r--r-- 1 jteb jteb 36652 18 okt 16:41 _2q.fdx -rw-r--r-- 1 jteb jteb 276 18 okt 16:41 _2q.fnm -rw-r--r-- 1 jteb jteb 4685726 18 okt 16:41 _2q.frq -rw-r--r-- 1 jteb jteb9166 18 okt 16:41 _2q.nrm -rw-r--r-- 1 jteb jteb 393230403 18 okt 16:42 _2q.prx -rw-r--r-- 1 jteb jteb7447 18 okt 16:42 _2q.tii -rw-r--r-- 1 jteb jteb 746299 18 okt 16:42 _2q.tis -rw-r--r-- 1 jteb jteb8394 18 okt 16:42 _2q.tvd -rw-r--r-- 1 jteb jteb 599185081 18 okt 16:42 _2q.tvf -rw-r--r-- 1 jteb jteb 73300 18 okt 16:42 _2q.tvx -rw-r--r-- 1 jteb jteb 1595882722 18 okt 16:45 _3u.fdt -rw-r--r-- 1 jteb jteb 63692 18 okt 16:45 _3u.fdx -rw-r--r-- 1 jteb jteb 330 18 okt 16:45 _3u.fnm -rw-r--r-- 1 jteb jteb 8001869 18 okt 16:45 _3u.frq -rw-r--r-- 1 jteb jteb 15926 18 okt 16:45 _3u.nrm -rw-r--r-- 1 jteb jteb 647374863 18 okt 16:45 _3u.prx -rw-r--r-- 1 jteb jteb 11319 18 okt 16:45 _3u.tii -rw-r--r-- 1 jteb jteb 1168399 18 okt 16:45 _3u.tis -rw-r--r-- 1 jteb jteb 14209 18 okt 16:45 _3u.tvd -rw-r--r-- 1 jteb jteb 986370136 18 okt 16:46 _3u.tvf -rw-r--r-- 1 jteb jteb 127380 18 okt 16:46 _3u.tvx -rw-r--r-- 1 jteb jteb 2691565961 18 okt 16:49 _4c.fdt -rw-r--r-- 1 jteb jteb 39572 18 okt 16:49 _4c.fdx -rw-r--r-- 1 jteb jteb 276 18 okt 16:49 _4c.fnm -rw-r--r-- 1 jteb jteb18724620 18 okt 16:49 _4c.frq -rw-r--r-- 1 jteb jteb9896 18 okt 16:49 _4c.nrm -rw-r--r-- 1 jteb jteb 590255960 18 okt 16:50 _4c.prx -rw-r--r-- 1 jteb jteb 141243 18 okt 16:50 _4c.tii -rw-r--r-- 1 jteb jteb12185869 18 okt 16:50 _4c.tis -rw-r--r-- 1 jteb jteb9894 18 okt 16:50 _4c.tvd -rw-r--r-- 1 jteb jteb 932649779 18 okt 16:51 _4c.tvf -rw-r--r-- 1 jteb jteb 79140 18 okt 16:51 _4c.tvx -rw-r--r-- 1 jteb jteb 2398908136 18 okt 16:52 _4d.fdt -rw-r--r-- 1 jteb jteb 548 18 okt 16:52 _4d.fdx -rw-r--r-- 1 jteb jteb 354 18 okt 16:52 _4d.fnm -rw-r--r-- 1 jteb jteb24581614 18 okt 16:52 _4d.frq -rw-r--r-- 1 jteb jteb 140 18 okt 16:52 _4d.nrm -rw-r--r-- 1 jteb jteb 158243133 18 okt 16:52 _4d.prx -rw-r--r-- 1 jteb jteb 141948 18 okt 16:52 _4d.tii -rw-r--r-- 1 jteb jteb12259425 18 okt 16:52 _4d.tis -rw-r--r-- 1 jteb jteb 140 18 okt 16:52 _4d.tvd -rw-r--r-- 1 jteb jteb 303769970 18 okt 16:53 _4d.tvf -rw-r--r-- 1 jteb jteb1092 18 okt 16:53 _4d.tvx -rw-r--r-- 1 jteb jteb 4118409126 29 okt 16:26 _6g.fdt -rw-r--r-- 1 jteb jteb1484 29 okt 16:26 _6g.fdx -rw-r--r-- 1 jteb jteb 384 29 okt 16:17 _6g.fnm -rw-r--r-- 1 jteb jteb35294399 29 okt 16:27 _6g.frq -rw-r--r-- 1 jteb jteb 374 29 okt 16:27 _6g.nrm -rw-r--r-- 1 jteb jteb 230791431 29 okt 16:27 _6g.prx -rw-r--r-- 1 jteb jteb 143860 29 okt 16:27 _6g.tii -rw-r--r-- 1 jteb jteb12491845 29 okt 16:27 _6g.tis -rw-r--r-- 1 jteb jteb 295 29 okt 16:28 _6g.tvd -rw-r--r-- 1 jteb jteb 444939185 29 okt 16:28 _6g.tvf -rw-r--r-- 1 jteb jteb2964 29 okt 16:28 _6g.tvx -rw-r--r-- 1 jteb jteb 2758122671 29 okt 16:31 _6h.fdt -rw-r--r-- 1 jteb jteb 96388 29 okt 16:31 _6h.fdx -rw-r--r-- 1 jteb jteb 723 29 okt 16:29 _6h.fnm -rw-r--r-- 1 jteb jteb51142700 29 okt 16:31 _6h.frq -rw-r--r-- 1 jteb jteb 24100 29 okt 16:31 _6h.nrm -rw-r--r-- 1 jteb jteb 189178767 29 okt 16:31 _6h.prx -rw-r--r-- 1 jteb jteb 270472 29 okt 16:31 _6h.tii -rw-r--r-- 1 jteb jteb21710405 29 okt 16:31 _6h.tis -rw-r--r-- 1 jteb jteb 23873 29 okt 16:31 _6h.tvd -rw-r--r-- 1 jteb jteb 394088075 29 okt 16:31 _6h.tvf -rw-r--r-- 1 jteb jteb 192772 29 okt 16:31 _6h.tvx -rw-r--r-- 1 jteb jteb 0 29 okt 20:22 _8b.fnm -rw-r--r-- 1 jteb jteb 0 29 okt 20:26 _8b.tvd -rw-r--r-- 1 jteb jteb 0 29 okt 20:26 _8b.tvf -rw-r--r-- 1 jteb jteb 0 29 okt 20:22 _8c.fdt -rw-r--r-- 1 jteb jteb 0 29 okt 20:22 _8c.fdx -rw-r--r-- 1 jteb jteb 0 29 okt 20:26 _8c.frq -rw-r--r-- 1 jteb jteb 0 29 okt 20:24 _8c.tii -rw-r--r-- 1 jteb jteb 0 29 okt 20:24 _8c.tis -rw-r--r-- 1 jteb jteb 0 29 okt 20:28 _8c.tvf -rw-r--r-- 1 jteb jteb 0 29 okt 20:30 _8c.tvx -rw-r--r-- 1 jteb jteb 0 29 okt 20:24 _8d.fdt -rw-r--r-- 1 jteb jteb 0 29 okt 20:25 _8d.fdx -rw-r
[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
[ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926360#action_12926360 ] Jason Rutherglen commented on LUCENE-2729: -- Post a listing of the index files with their lengths, ie, ls -la. > Index corruption after 'read past EOF' under heavy update load and snapshot > export > -- > > Key: LUCENE-2729 > URL: https://issues.apache.org/jira/browse/LUCENE-2729 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 3.0.1, 3.0.2 > Environment: Happens on both OS X 10.6 and Windows 2008 Server. > Integrated with zoie (using a zoie snapshot from 2010-08-06: > zoie-2.0.0-snapshot-20100806.jar). >Reporter: Nico Krijnen > > We have a system running lucene and zoie. We use lucene as a content store > for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled > backups of the index. This works fine for small indexes and when there are > not a lot of changes to the index when the backup is made. > On large indexes (about 5 GB to 19 GB), when a backup is made while the index > is being changed a lot (lots of document additions and/or deletions), we > almost always get a 'read past EOF' at some point, followed by lots of 'Lock > obtain timed out'. > At that point we get lots of 0 kb files in the index, data gets lots, and the > index is unusable. > When we stop our server, remove the 0kb files and restart our server, the > index is operational again, but data has been lost. > I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. > Hopefully someone has some ideas where to look to fix this. > Some more details... > Stack trace of the read past EOF and following Lock obtain timed out: > {code} > 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at > org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245) > at > org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:166) > at > org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987) > at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162) > at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373) > 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] > ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - Problem > copying segments: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:84) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:957) > at > proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176) > at > proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228) > at > proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153) > at > proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134) > at > proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171) > at > proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.j