[ 
https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4975:
-------------------------------

    Attachment: LUCENE-4975.patch

Patch fixes a bug in IndexReplicationHandler (still need to fix in 
IndexAndTaxonomy) and adds some nocommits which I want to take care before I 
commit it.

However, I hit a new test failure, which reproduces with the following command 
{{ant test -Dtestcase=IndexReplicationClientTest 
-Dtests.method=testConsistencyOnExceptions 
-Dtests.seed=EAF5294292642F1:6EE70BB59A9FC3CA}}.

The error is weird. I ran the test w/ -Dtests.verbose=true and here's the 
troubling parts from the log:

{noformat}
ReplicationThread-index: MockDirectoryWrapper: now throw random exception 
during open file=segments_a
java.lang.Throwable
        at 
org.apache.lucene.store.MockDirectoryWrapper.maybeThrowIOExceptionOnOpen(MockDirectoryWrapper.java:364)
        at 
org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:522)
        at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:281)
        at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:340)
        at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:668)
        at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:515)
        at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
        at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:682)
        at 
org.apache.lucene.replicator.IndexReplicationHandler.revisionReady(IndexReplicationHandler.java:208)
        at 
org.apache.lucene.replicator.ReplicationClient.doUpdate(ReplicationClient.java:248)
        at 
org.apache.lucene.replicator.ReplicationClient.access$1(ReplicationClient.java:188)
        at 
org.apache.lucene.replicator.ReplicationClient$ReplicationThread.run(ReplicationClient.java:76)
IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: init: current 
segments file is "segments_9"; 
deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@117da39a
IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: init: load 
commit "segments_9"
IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: init: load 
commit "segments_a"
IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: now checkpoint 
"_0(5.0):C1 _1(5.0):C1 _2(5.0):c1 _3(5.0):c1 _4(5.0):c1 _5(5.0):c1 _6(5.0):c1 
_7(5.0):c1 _8(5.0):c1" [9 segments ; isCommit = false]
IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: 0 msec to 
checkpoint
IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: deleteCommits: 
now decRef commit "segments_9"
IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: delete 
"segments_9"
IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: init: create=false

....

IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: startCommit(): 
start
IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: startCommit 
index=_0(5.0):C1 _1(5.0):C1 _2(5.0):c1 _3(5.0):c1 _4(5.0):c1 _5(5.0):c1 
_6(5.0):c1 _7(5.0):c1 _8(5.0):c1 changeCount=1
IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: done all syncs: 
[_2.si, _7.si, _5.cfs, _1.fnm, _4.cfs, _8.si, _4.cfe, _5.cfe, _0.si, _0.fnm, 
_6.cfe, _8.cfs, _3.cfs, _4.si, _7.cfe, _2.cfs, _5.si, _6.cfs, _1.fdx, _8.cfe, 
_1.fdt, _1.si, _7.cfs, _0.fdx, _3.si, _6.si, _3.cfe, _2.cfe, _0.fdt]
IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: commit: 
pendingCommit != null
IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: commit: wrote 
segments file "segments_a"
IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: now checkpoint 
"_0(5.0):C1 _1(5.0):C1 _2(5.0):c1 _3(5.0):c1 _4(5.0):c1 _5(5.0):c1 _6(5.0):c1 
_7(5.0):c1 _8(5.0):c1" [9 segments ; isCommit = true]
IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: deleteCommits: 
now decRef commit "segments_a"
IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: delete "_9.cfe"
IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: delete "_9.cfs"
IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: delete "_9.si"
IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: 0 msec to 
checkpoint
IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: commit: done
IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: at close: 
_0(5.0):C1 _1(5.0):C1 _2(5.0):c1 _3(5.0):c1 _4(5.0):c1 _5(5.0):c1 _6(5.0):c1 
_7(5.0):c1 _8(5.0):c1
IndexReplicationHandler 0 [Wed May 08 22:47:46 WST 2013; 
ReplicationThread-index]: updateHandlerState(): currentVersion=a 
currentRevisionFiles={index=[Lorg.apache.lucene.replicator.RevisionFile;@9bc2e26e}
IndexReplicationHandler 0 [Wed May 08 22:47:46 WST 2013; 
ReplicationThread-index]: {version=9}
{noformat}

I debug traced it and here's what I think is happening:

* MDW throws FNFE for segments_a on sis.read(dir), therefore the read 
SegmentInfos sees segments_9 as the current good commit. IW's 
segmentInfos.commitData stores version=9, which corresponds to segments_9.
* IFD lists the files in the Directory, and finds both segments_a and 
segments_9 and through a series of calls, deletes segments_9 and keeps 
segments_a, since it is newer.
* IW ctor, line 719, increments changeCount, since IFD.startingCommitDeleted is 
true -- which happens b/c IFD is initialized with segments_9, but finds 
segments_a and therefore deletes it.
* IW then makes a commit, with the commit data from segments_9 ("version=9"), 
to a new commit point generation 10 (a in hex).
* The Replicator's latest version is gen=10, the handler reads gen=10 from the 
index, but with the wrong commitData, and therefore the test fails.

I still want to review all this again, to double-check my understanding, but it 
looks like something bad happening between IW and IFD. At least from the 
perspective of the replicator, the index shouldn't "go forward" by new 
IW().close().

If I modify the handler to do:
{code}
IndexWriter writer = new IndexWriter();
writer.deleteUnusedFiles();
writer.rollback();
{code}

The test passes. But is this the right solution -- i.e. guarantee that IW never 
commits? Or is this a bug in IW?
                
> Add Replication module to Lucene
> --------------------------------
>
>                 Key: LUCENE-4975
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4975
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>         Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, 
> LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch
>
>
> I wrote a replication module which I think will be useful to Lucene users who 
> want to replicate their indexes for e.g high-availability, taking hot backups 
> etc.
> I will upload a patch soon where I'll describe in general how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to