[ https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shai Erera updated LUCENE-4975: ------------------------------- Attachment: LUCENE-4975.patch Patch fixes a bug in IndexReplicationHandler (still need to fix in IndexAndTaxonomy) and adds some nocommits which I want to take care before I commit it. However, I hit a new test failure, which reproduces with the following command {{ant test -Dtestcase=IndexReplicationClientTest -Dtests.method=testConsistencyOnExceptions -Dtests.seed=EAF5294292642F1:6EE70BB59A9FC3CA}}. The error is weird. I ran the test w/ -Dtests.verbose=true and here's the troubling parts from the log: {noformat} ReplicationThread-index: MockDirectoryWrapper: now throw random exception during open file=segments_a java.lang.Throwable at org.apache.lucene.store.MockDirectoryWrapper.maybeThrowIOExceptionOnOpen(MockDirectoryWrapper.java:364) at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:522) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:281) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:340) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:668) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:515) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:682) at org.apache.lucene.replicator.IndexReplicationHandler.revisionReady(IndexReplicationHandler.java:208) at org.apache.lucene.replicator.ReplicationClient.doUpdate(ReplicationClient.java:248) at org.apache.lucene.replicator.ReplicationClient.access$1(ReplicationClient.java:188) at org.apache.lucene.replicator.ReplicationClient$ReplicationThread.run(ReplicationClient.java:76) IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: init: current segments file is "segments_9"; deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@117da39a IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: init: load commit "segments_9" IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: init: load commit "segments_a" IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: now checkpoint "_0(5.0):C1 _1(5.0):C1 _2(5.0):c1 _3(5.0):c1 _4(5.0):c1 _5(5.0):c1 _6(5.0):c1 _7(5.0):c1 _8(5.0):c1" [9 segments ; isCommit = false] IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: 0 msec to checkpoint IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: deleteCommits: now decRef commit "segments_9" IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: delete "segments_9" IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: init: create=false .... IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: startCommit(): start IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: startCommit index=_0(5.0):C1 _1(5.0):C1 _2(5.0):c1 _3(5.0):c1 _4(5.0):c1 _5(5.0):c1 _6(5.0):c1 _7(5.0):c1 _8(5.0):c1 changeCount=1 IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: done all syncs: [_2.si, _7.si, _5.cfs, _1.fnm, _4.cfs, _8.si, _4.cfe, _5.cfe, _0.si, _0.fnm, _6.cfe, _8.cfs, _3.cfs, _4.si, _7.cfe, _2.cfs, _5.si, _6.cfs, _1.fdx, _8.cfe, _1.fdt, _1.si, _7.cfs, _0.fdx, _3.si, _6.si, _3.cfe, _2.cfe, _0.fdt] IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: commit: pendingCommit != null IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: commit: wrote segments file "segments_a" IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: now checkpoint "_0(5.0):C1 _1(5.0):C1 _2(5.0):c1 _3(5.0):c1 _4(5.0):c1 _5(5.0):c1 _6(5.0):c1 _7(5.0):c1 _8(5.0):c1" [9 segments ; isCommit = true] IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: deleteCommits: now decRef commit "segments_a" IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: delete "_9.cfe" IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: delete "_9.cfs" IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: delete "_9.si" IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: 0 msec to checkpoint IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: commit: done IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: at close: _0(5.0):C1 _1(5.0):C1 _2(5.0):c1 _3(5.0):c1 _4(5.0):c1 _5(5.0):c1 _6(5.0):c1 _7(5.0):c1 _8(5.0):c1 IndexReplicationHandler 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: updateHandlerState(): currentVersion=a currentRevisionFiles={index=[Lorg.apache.lucene.replicator.RevisionFile;@9bc2e26e} IndexReplicationHandler 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: {version=9} {noformat} I debug traced it and here's what I think is happening: * MDW throws FNFE for segments_a on sis.read(dir), therefore the read SegmentInfos sees segments_9 as the current good commit. IW's segmentInfos.commitData stores version=9, which corresponds to segments_9. * IFD lists the files in the Directory, and finds both segments_a and segments_9 and through a series of calls, deletes segments_9 and keeps segments_a, since it is newer. * IW ctor, line 719, increments changeCount, since IFD.startingCommitDeleted is true -- which happens b/c IFD is initialized with segments_9, but finds segments_a and therefore deletes it. * IW then makes a commit, with the commit data from segments_9 ("version=9"), to a new commit point generation 10 (a in hex). * The Replicator's latest version is gen=10, the handler reads gen=10 from the index, but with the wrong commitData, and therefore the test fails. I still want to review all this again, to double-check my understanding, but it looks like something bad happening between IW and IFD. At least from the perspective of the replicator, the index shouldn't "go forward" by new IW().close(). If I modify the handler to do: {code} IndexWriter writer = new IndexWriter(); writer.deleteUnusedFiles(); writer.rollback(); {code} The test passes. But is this the right solution -- i.e. guarantee that IW never commits? Or is this a bug in IW? > Add Replication module to Lucene > -------------------------------- > > Key: LUCENE-4975 > URL: https://issues.apache.org/jira/browse/LUCENE-4975 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Shai Erera > Assignee: Shai Erera > Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, > LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch > > > I wrote a replication module which I think will be useful to Lucene users who > want to replicate their indexes for e.g high-availability, taking hot backups > etc. > I will upload a patch soon where I'll describe in general how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org