[jira] Commented: (DERBY-3254) Implement the replication failover functionality
[ https://issues.apache.org/jira/browse/DERBY-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12561310#action_12561310 ] V.Narayanan commented on DERBY-3254: I ran the tests on this patch and had a clean run of the junit all suite. Implement the replication failover functionality Key: DERBY-3254 URL: https://issues.apache.org/jira/browse/DERBY-3254 Project: Derby Issue Type: Sub-task Components: Replication Reporter: V.Narayanan Assignee: V.Narayanan Attachments: failover_impl_notforcommit.diff, failover_impl_notforcommit.stat, failover_impl_v1.diff, failover_impl_v1.stat, failover_impl_v2.diff, failover_impl_v2.stat, failover_impl_v3.diff, failover_impl_v3.stat, failover_impl_v4.diff, failover_impl_v4.stat, failover_impl_v5.diff, failover_impl_v5.stat, failover_impl_v6.diff, failover_impl_v6.stat -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (DERBY-3254) Implement the replication failover functionality
[ https://issues.apache.org/jira/browse/DERBY-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560983#action_12560983 ] Øystein Grøvlen commented on DERBY-3254: I do not understand the change to MasterController#startFailover. It seems like handleFailoverFailure will be called in all cases now. Also, exceptions thrown by handleFailoverFailure called from the try block, will be caught and passed to handeFailoverFailure by the catch block. That seems a bit unnecessary. I think the whole handling of ack, as it was in v4 of the patch, should be moved outside the try block. Implement the replication failover functionality Key: DERBY-3254 URL: https://issues.apache.org/jira/browse/DERBY-3254 Project: Derby Issue Type: Sub-task Components: Replication Reporter: V.Narayanan Assignee: V.Narayanan Attachments: failover_impl_notforcommit.diff, failover_impl_notforcommit.stat, failover_impl_v1.diff, failover_impl_v1.stat, failover_impl_v2.diff, failover_impl_v2.stat, failover_impl_v3.diff, failover_impl_v3.stat, failover_impl_v4.diff, failover_impl_v4.stat, failover_impl_v5.diff, failover_impl_v5.stat -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (DERBY-3254) Implement the replication failover functionality
[ https://issues.apache.org/jira/browse/DERBY-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560787#action_12560787 ] V.Narayanan commented on DERBY-3254: 1) The exception being thrown upon successful failover in MasterController#startFailover needs to moved outside the try catch block. 2) If failover is successful AsynchonousLogShipper#stopLogShippment needs to be called to terminate the log shipper thread. Implement the replication failover functionality Key: DERBY-3254 URL: https://issues.apache.org/jira/browse/DERBY-3254 Project: Derby Issue Type: Sub-task Components: Replication Reporter: V.Narayanan Assignee: V.Narayanan Attachments: failover_impl_notforcommit.diff, failover_impl_notforcommit.stat, failover_impl_v1.diff, failover_impl_v1.stat, failover_impl_v2.diff, failover_impl_v2.stat, failover_impl_v3.diff, failover_impl_v3.stat, failover_impl_v4.diff, failover_impl_v4.stat -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (DERBY-3254) Implement the replication failover functionality
[ https://issues.apache.org/jira/browse/DERBY-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560520#action_12560520 ] Øystein Grøvlen commented on DERBY-3254: With the latest patch, failover_impl_v4.diff , I get an error in the error code test. It is probably related to the fact that a database severity error as been added. While fixing this, here is some minor issues that should also be addressed: - Update javadoc of MasterFactory/MasterController#startFailover to indicate that it will throw an exception also on success. - Some unecessary imports (Property, SQLException) - The text of the javadoc for LogToFile#stopReplicationSlaveRole could still be improved. Implement the replication failover functionality Key: DERBY-3254 URL: https://issues.apache.org/jira/browse/DERBY-3254 Project: Derby Issue Type: Sub-task Components: Replication Reporter: V.Narayanan Assignee: V.Narayanan Attachments: failover_impl_notforcommit.diff, failover_impl_notforcommit.stat, failover_impl_v1.diff, failover_impl_v1.stat, failover_impl_v2.diff, failover_impl_v2.stat, failover_impl_v3.diff, failover_impl_v3.stat, failover_impl_v4.diff, failover_impl_v4.stat -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (DERBY-3254) Implement the replication failover functionality
[ https://issues.apache.org/jira/browse/DERBY-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559458#action_12559458 ] Øystein Grøvlen commented on DERBY-3254: Thanks for the patch, Narayanan. Her are my comments: 1. Instead of using Database#freeze, think you should use RawStoreFactory#freeze since MasterController relates to the store, and not the SQL layer. This also removes the need for importing SQLException. 2. Maybe I am wrong, but it seems to me that you are shutting down the entire system. At least, you do not specify which database to shut down. Instead of an explicit shutdown, I think you should consider to just use database severity for the exception you throw. I think that will make the connection close down the database automatically. 3. MasterController#handleFailoverFailure: Don't you mean to use REPLICATION_FAILOVER_UNSUCCESSFUL also for the else part? 4. Some of my comments to the previous version of the patch does not seem to have been addressed. Implement the replication failover functionality Key: DERBY-3254 URL: https://issues.apache.org/jira/browse/DERBY-3254 Project: Derby Issue Type: Sub-task Components: Replication Reporter: V.Narayanan Assignee: V.Narayanan Attachments: failover_impl_notforcommit.diff, failover_impl_notforcommit.stat, failover_impl_v1.diff, failover_impl_v1.stat, failover_impl_v2.diff, failover_impl_v2.stat, failover_impl_v3.diff, failover_impl_v3.stat -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (DERBY-3254) Implement the replication failover functionality
[ https://issues.apache.org/jira/browse/DERBY-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558198#action_12558198 ] V.Narayanan commented on DERBY-3254: Before starting to make the patch there are a few things I thought I should detail out I had earlier decided on the following set of steps to implement failover * The failover command is given to the master. * The master flushes the log buffer * The master sends this command to the slave and waits for a response * The slave responds with an acknowledgement * The master stops replication There are a few refinements to these steps that would become necessary because of the following issues 1) When the master stops replication is it necessary for it to shutdown the database? I believe the answer is YES because there is no point in having the master serving clients when the slave is doing likewise for the same database. Having two databases serving clients would create trouble for the users. 2) In the aforementioned steps there is a window between the stop master operation (not shutting down database), sending a failover command to the slave, not succeeding, restarting master operation. Stopping master, flushes the log buffer, and stops the log buffer from buffering more records. But this does not stop the clients being served. So the next time you start replication you would be inconsistent. Therefore we would need to stop clients in some way before flushing the log buffer. The above two issues lead to the following refinements in the steps mentioned earlier * The failover command is given to the master * We stop the clients upon receiving this command * The master Flushes the log buffer * The master sends the failover command to the slave and waits for a response * The slave responds with a acknowledgement * The master stops replication and shuts down the database. In the event of a failure the master would resume serving clients. Implement the replication failover functionality Key: DERBY-3254 URL: https://issues.apache.org/jira/browse/DERBY-3254 Project: Derby Issue Type: Sub-task Components: Replication Reporter: V.Narayanan Assignee: V.Narayanan Attachments: failover_impl_notforcommit.diff, failover_impl_notforcommit.stat, failover_impl_v1.diff, failover_impl_v1.stat -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (DERBY-3254) Implement the replication failover functionality
[ https://issues.apache.org/jira/browse/DERBY-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557915#action_12557915 ] V.Narayanan commented on DERBY-3254: I will change this patch to do the following 1) Master initiates failover by sending a failover message to the slave and waits for the acknowledgment from the slave. (Slave will send an acknowledgement if its attempt to failover succeeds.) 2) If the acknowledgment is received it proceeds with failover. 3) Otherwise it continues as master without doing anything. Implement the replication failover functionality Key: DERBY-3254 URL: https://issues.apache.org/jira/browse/DERBY-3254 Project: Derby Issue Type: Sub-task Components: Replication Reporter: V.Narayanan Assignee: V.Narayanan Attachments: failover_impl_notforcommit.diff, failover_impl_notforcommit.stat, failover_impl_v1.diff, failover_impl_v1.stat -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (DERBY-3254) Implement the replication failover functionality
[ https://issues.apache.org/jira/browse/DERBY-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557916#action_12557916 ] V.Narayanan commented on DERBY-3254: I am proceeding to remove the stopSlave method implementation I had added here to the stop issue which needs to be reopened to address the comments there, the slave issue seemed to me the better context to address this issue. Implement the replication failover functionality Key: DERBY-3254 URL: https://issues.apache.org/jira/browse/DERBY-3254 Project: Derby Issue Type: Sub-task Components: Replication Reporter: V.Narayanan Assignee: V.Narayanan Attachments: failover_impl_notforcommit.diff, failover_impl_notforcommit.stat, failover_impl_v1.diff, failover_impl_v1.stat -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (DERBY-3254) Implement the replication failover functionality
[ https://issues.apache.org/jira/browse/DERBY-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1221#action_1221 ] Øystein Grøvlen commented on DERBY-3254: Thanks for the patch, Narayanan. My main question about this patch is about failure handling during failover. If failover is not successful, I think the current master should continue as master. Also, I am not sure that just being able to send the failover message is sufficient to decide that failover was successful. Maybe some acknowledgement from the slave is needed? As it is, the implementation of stop and failover is identical at the slave. I guess it is the implementation of stop that is missing something? Some minor issues: - LogToFile#stopReplicationSlaveRole(): I think the javadoc here is a bit inaccurate. AFAIU, setting the inReplicationSlaveMode flag will make the slave complete recovery and boot the database. - There is a double ; in SlaveController#failover. - I think a successful failover should also be recorded in derby.log also at the (former) slave. - There is a typo in the message text for R011: perfomed Implement the replication failover functionality Key: DERBY-3254 URL: https://issues.apache.org/jira/browse/DERBY-3254 Project: Derby Issue Type: Sub-task Components: Replication Reporter: V.Narayanan Assignee: V.Narayanan Attachments: failover_impl_notforcommit.diff, failover_impl_notforcommit.stat, failover_impl_v1.diff, failover_impl_v1.stat -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.