[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Jan te Beest (JIRA) Wed, 24 Nov 2010 00:42:43 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935256#action_12935256
 ]


Jan te Beest commented on LUCENE-2729:
--------------------------------------

First of all, many thanks for your support (and patience at it)!

{quote}
Are you running with assertions enabled? Maybe we'll catch
something...
{quote}
Well I had, but somehow the option to the JVM got lost when switching between 
git branches. 
I'll turn it on again before testing with Xiaoyang's patches merged into my 
local git repository.

Looking at the patches, I have a good feeling they will solve the issue.

Xiaoyang, I see a patch against RealtimeIndexDataLoader as well. Does it have 
anything to do with the 'load manager stopped' errors I encountered?
I have to say though that I didn't see those errors when the backup process 
didn't fail.

{quote}
Also: can you try upgrading to the latest 3.0.x tip (svn checkout
https://svn.apache.org/repos/asf/lucene/java/branches/lucene_3_0)?
This doesn't sound like any bug we've fixed on 3.0.x but it's worth a
shot 
{quote}
I'll do that anyway, I like being at the tip of the branch :)

{quote}
I see strange output for the backup, eg "Backup to file _7c.prx,
dataLen: 0" and then "Backup to file _7c.prx DONE, written: 8" - why
is that? (A number of files show this, but then a bunch of others
look like "correct" sizes). Are you sure the backup copy of
segments_5e is actually valid?
{quote}
We actually encounter this regularly for any files other than segments_*, it 
seems harmless though.
As for the segments_5e being valid at that point; yes it is.

bq. You should probably mark ZoieIDP.lastSnapshotSegmentsFileName as volatile.
You're totally right, how could I have missed that?

OK, I'll do some more tests and let you know the outcome.


> Index corruption after 'read past EOF' under heavy update load and snapshot 
> export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
> Integrated with zoie (using a zoie snapshot from 2010-08-06: 
> zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, 
> backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, 
> eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, 
> read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store 
> for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
> backups of the index. This works fine for small indexes and when there are 
> not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
> is being changed a lot (lots of document additions and/or deletions), we 
> almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
> obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the 
> index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the 
> index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
> Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at 
> org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at 
> org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at 
> org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at 
> proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at 
> proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at 
> proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at 
> proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at 
> proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at 
> proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is 
> using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one 
> working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Reply via email to