[jira] [Updated] (HADOOP-9220) Unnecessary transition to standby in ActiveStandbyElector
[ https://issues.apache.org/jira/browse/HADOOP-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HADOOP-9220: Resolution: Fixed Fix Version/s: 2.0.5-beta 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to branch-2 and trunk. thanks Tom for tracking this down Unnecessary transition to standby in ActiveStandbyElector - Key: HADOOP-9220 URL: https://issues.apache.org/jira/browse/HADOOP-9220 Project: Hadoop Common Issue Type: Bug Components: ha Reporter: Tom White Assignee: Tom White Priority: Critical Fix For: 3.0.0, 2.0.5-beta Attachments: HADOOP-9220.patch, HADOOP-9220.patch, hadoop-9220.txt When performing a manual failover from one HA node to a second, under some circumstances the second node will transition from standby - active - standby - active. This is with automatic failover enabled, so there is a ZK cluster doing leader election. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9220) Unnecessary transition to standby in ActiveStandbyElector
[ https://issues.apache.org/jira/browse/HADOOP-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HADOOP-9220: Attachment: hadoop-9220.txt Hey Tom. I think you can just use the {{wantsToBeInElection}} variable here rather than going back to ZK to check the current state of the znode (which I fear might be race-prone). How does this patch look to you? It seems to pass the test you added. Unnecessary transition to standby in ActiveStandbyElector - Key: HADOOP-9220 URL: https://issues.apache.org/jira/browse/HADOOP-9220 Project: Hadoop Common Issue Type: Bug Components: ha Reporter: Tom White Assignee: Tom White Priority: Critical Attachments: HADOOP-9220.patch, HADOOP-9220.patch, hadoop-9220.txt When performing a manual failover from one HA node to a second, under some circumstances the second node will transition from standby - active - standby - active. This is with automatic failover enabled, so there is a ZK cluster doing leader election. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9220) Unnecessary transition to standby in ActiveStandbyElector
[ https://issues.apache.org/jira/browse/HADOOP-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HADOOP-9220: Priority: Critical (was: Major) Unnecessary transition to standby in ActiveStandbyElector - Key: HADOOP-9220 URL: https://issues.apache.org/jira/browse/HADOOP-9220 Project: Hadoop Common Issue Type: Bug Components: ha Reporter: Tom White Assignee: Tom White Priority: Critical Attachments: HADOOP-9220.patch, HADOOP-9220.patch When performing a manual failover from one HA node to a second, under some circumstances the second node will transition from standby - active - standby - active. This is with automatic failover enabled, so there is a ZK cluster doing leader election. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9220) Unnecessary transition to standby in ActiveStandbyElector
[ https://issues.apache.org/jira/browse/HADOOP-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated HADOOP-9220: -- Attachment: HADOOP-9220.patch I've written a test which fails without the patch. Basically it checks that the number of times that the HA service transitions to active is as expected. There is another part to the fix, in addition to the previous patch. In ZKFailoverController#recheckElectability() the check may be postponed if the FC has ceded its active state and is waiting for a timeout (10s) before rejoining the election. The trouble is that the FC may have become active again in the intervening time, but recheckElectability() doesn't take account of this (and will call ActiveStandbyElector#createLockNodeAsync), and so the FC will transition to standby and then to active again. The fix I have implemented changes a postponed recheckElectability() to check if the FC is not currently active before joining the election. Unnecessary transition to standby in ActiveStandbyElector - Key: HADOOP-9220 URL: https://issues.apache.org/jira/browse/HADOOP-9220 Project: Hadoop Common Issue Type: Bug Components: ha Reporter: Tom White Assignee: Tom White Attachments: HADOOP-9220.patch, HADOOP-9220.patch When performing a manual failover from one HA node to a second, under some circumstances the second node will transition from standby - active - standby - active. This is with automatic failover enabled, so there is a ZK cluster doing leader election. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9220) Unnecessary transition to standby in ActiveStandbyElector
[ https://issues.apache.org/jira/browse/HADOOP-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated HADOOP-9220: -- Status: Patch Available (was: Open) Unnecessary transition to standby in ActiveStandbyElector - Key: HADOOP-9220 URL: https://issues.apache.org/jira/browse/HADOOP-9220 Project: Hadoop Common Issue Type: Bug Components: ha Reporter: Tom White Assignee: Tom White Attachments: HADOOP-9220.patch, HADOOP-9220.patch When performing a manual failover from one HA node to a second, under some circumstances the second node will transition from standby - active - standby - active. This is with automatic failover enabled, so there is a ZK cluster doing leader election. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9220) Unnecessary transition to standby in ActiveStandbyElector
[ https://issues.apache.org/jira/browse/HADOOP-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated HADOOP-9220: -- Attachment: HADOOP-9220.patch The reason for this behaviour is because there can be multiple watchers registered for a given ZK client in ActiveStandbyElector. (The monitorLockNodeAsync() method creates a new watcher object for the existing ZK client.) This can cause multiple invocations of joinElectionInternal() for a single watch event, each of which will make a call to create the lock znode. The first call will cause the a transition to active, while subsequent ones will cause a transition to standby (in the isNodeExists clause of the processResult() method). In a manual failover scenario the node will still transition to active again, since the other node has ceded from the election for 10s, but it's still an unnecessary transition that could be eliminated. I did some manual testing with the attached patch, and the extra transition was avoided. I'll see if I can write a unit test for it. Unnecessary transition to standby in ActiveStandbyElector - Key: HADOOP-9220 URL: https://issues.apache.org/jira/browse/HADOOP-9220 Project: Hadoop Common Issue Type: Bug Components: ha Reporter: Tom White Assignee: Tom White Attachments: HADOOP-9220.patch When performing a manual failover from one HA node to a second, under some circumstances the second node will transition from standby - active - standby - active. This is with automatic failover enabled, so there is a ZK cluster doing leader election. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira