[
https://issues.apache.org/jira/browse/DERBY-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dag H. Wanvik updated DERBY-4186:
---------------------------------
Attachment: derby-4186.diff
ok-slave.txt
bad-slave.txt
Further analysis shows that the uncaught IOException I referred to
when trying to send a stop message to the slave is just a part of the
generic MasterController.tearDownNetwork when establishment of the master fails
initially.
The test correctly loops to wait for successful startup of the master and there
will be
some failed attempts (waiting for the slave to be ready) which visit that
failure code path.
It is not a problem though, just (another) red herring.
I enclose a patch proposal, which addresses the real issue: the slave does not
shut down when it should.
The scenario is that the slave receives a stop replication message before it
had time to complete
the slave boot (race condition), see attachment bad-slave.txt. If I make the
test client
wait instead of proceeding to stop the master, the slave log looks like the one
in ok-slave.txt (attached).
It would be nice if any of the original authors had a look at this patch as I
am not familiar with this code.
The patch also modifies the test client to loop until success, accepting
intermediate state
CANNOT_CONNECT_TO_DB_IN_SLAVE_MODE.
Running regressions.
> After failover, test fails when it succeeds in connecting early to failed
> over slave
> ------------------------------------------------------------------------------------
>
> Key: DERBY-4186
> URL: https://issues.apache.org/jira/browse/DERBY-4186
> Project: Derby
> Issue Type: Bug
> Components: Replication, Test
> Affects Versions: 10.6.0.0
> Reporter: Dag H. Wanvik
> Attachments: bad-slave.txt, derby-4186.diff, ok-slave.txt
>
>
> Occasionally I see this error in ReplicationRun_Local_3_p3:
> 1)
> testReplication_Local_3_p3_StateNegativeTests(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p3)junit.framework.AssertionFailedError:
> Expected SQLState'08004', but got connection!
> at
> org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.waitForSQLState(ReplicationRun.java:332)
> at
> org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p3.testReplication_Local_3_p3_StateNegativeTests(ReplicationRun_Local_3_p3.java:170)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at
> org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:105)
> at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
> at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
> at junit.extensions.TestSetup.run(TestSetup.java:25)
> In the code, after a stopMaster is given to the master (should lead to
> fail-over),
> the tests expects to see CANNOT_CONNECT_TO_DB_IN_SLAVE_MODE (08004.C.7),
> which will only succeed if
> the tests gets to try to connect before the failover has started. This seems
> wrong. If the failover has completed, it should expect a successful
> connect (which boots the database, btw, since its shut down after auccessful
> failover).
> Quote from code:
> waitForSQLState("08004", 100L, 20, // 08004.C.7 -
> CANNOT_CONNECT_TO_DB_IN_SLAVE_MODE
> slaveDatabasePath + FS + slaveDbSubPath + FS + replicatedDb,
> slaveServerHost, slaveServerPort); // _failOver above fails...
> There is a race between the failover on the slave and the test here I think.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.