[jira] [Updated] (HADOOP-9220) Unnecessary transition to standby in ActiveStandbyElector

2013-05-14 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HADOOP-9220:


   Resolution: Fixed
Fix Version/s: 2.0.5-beta
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to branch-2 and trunk. thanks Tom for tracking this down

 Unnecessary transition to standby in ActiveStandbyElector
 -

 Key: HADOOP-9220
 URL: https://issues.apache.org/jira/browse/HADOOP-9220
 Project: Hadoop Common
  Issue Type: Bug
  Components: ha
Reporter: Tom White
Assignee: Tom White
Priority: Critical
 Fix For: 3.0.0, 2.0.5-beta

 Attachments: HADOOP-9220.patch, HADOOP-9220.patch, hadoop-9220.txt


 When performing a manual failover from one HA node to a second, under some 
 circumstances the second node will transition from standby - active - 
 standby - active. This is with automatic failover enabled, so there is a ZK 
 cluster doing leader election.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9220) Unnecessary transition to standby in ActiveStandbyElector

2013-05-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HADOOP-9220:


Attachment: hadoop-9220.txt

Hey Tom. I think you can just use the {{wantsToBeInElection}} variable here 
rather than going back to ZK to check the current state of the znode (which I 
fear might be race-prone). How does this patch look to you? It seems to pass 
the test you added.

 Unnecessary transition to standby in ActiveStandbyElector
 -

 Key: HADOOP-9220
 URL: https://issues.apache.org/jira/browse/HADOOP-9220
 Project: Hadoop Common
  Issue Type: Bug
  Components: ha
Reporter: Tom White
Assignee: Tom White
Priority: Critical
 Attachments: HADOOP-9220.patch, HADOOP-9220.patch, hadoop-9220.txt


 When performing a manual failover from one HA node to a second, under some 
 circumstances the second node will transition from standby - active - 
 standby - active. This is with automatic failover enabled, so there is a ZK 
 cluster doing leader election.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9220) Unnecessary transition to standby in ActiveStandbyElector

2013-01-24 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HADOOP-9220:


Priority: Critical  (was: Major)

 Unnecessary transition to standby in ActiveStandbyElector
 -

 Key: HADOOP-9220
 URL: https://issues.apache.org/jira/browse/HADOOP-9220
 Project: Hadoop Common
  Issue Type: Bug
  Components: ha
Reporter: Tom White
Assignee: Tom White
Priority: Critical
 Attachments: HADOOP-9220.patch, HADOOP-9220.patch


 When performing a manual failover from one HA node to a second, under some 
 circumstances the second node will transition from standby - active - 
 standby - active. This is with automatic failover enabled, so there is a ZK 
 cluster doing leader election.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9220) Unnecessary transition to standby in ActiveStandbyElector

2013-01-18 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HADOOP-9220:
--

Attachment: HADOOP-9220.patch

I've written a test which fails without the patch. Basically it checks that the 
number of times that the HA service transitions to active is as expected.

There is another part to the fix, in addition to the previous patch. In 
ZKFailoverController#recheckElectability() the check may be postponed if the FC 
has ceded its active state and is waiting for a timeout (10s) before rejoining 
the election. The trouble is that the FC may have become active again in the 
intervening time, but recheckElectability() doesn't take account of this (and 
will call ActiveStandbyElector#createLockNodeAsync), and so the FC will 
transition to standby and then to active again. The fix I have implemented 
changes a postponed recheckElectability() to check if the FC is not currently 
active before joining the election.

 Unnecessary transition to standby in ActiveStandbyElector
 -

 Key: HADOOP-9220
 URL: https://issues.apache.org/jira/browse/HADOOP-9220
 Project: Hadoop Common
  Issue Type: Bug
  Components: ha
Reporter: Tom White
Assignee: Tom White
 Attachments: HADOOP-9220.patch, HADOOP-9220.patch


 When performing a manual failover from one HA node to a second, under some 
 circumstances the second node will transition from standby - active - 
 standby - active. This is with automatic failover enabled, so there is a ZK 
 cluster doing leader election.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9220) Unnecessary transition to standby in ActiveStandbyElector

2013-01-18 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HADOOP-9220:
--

Status: Patch Available  (was: Open)

 Unnecessary transition to standby in ActiveStandbyElector
 -

 Key: HADOOP-9220
 URL: https://issues.apache.org/jira/browse/HADOOP-9220
 Project: Hadoop Common
  Issue Type: Bug
  Components: ha
Reporter: Tom White
Assignee: Tom White
 Attachments: HADOOP-9220.patch, HADOOP-9220.patch


 When performing a manual failover from one HA node to a second, under some 
 circumstances the second node will transition from standby - active - 
 standby - active. This is with automatic failover enabled, so there is a ZK 
 cluster doing leader election.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9220) Unnecessary transition to standby in ActiveStandbyElector

2013-01-17 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HADOOP-9220:
--

Attachment: HADOOP-9220.patch

The reason for this behaviour is because there can be multiple watchers 
registered for a given ZK client in ActiveStandbyElector. (The 
monitorLockNodeAsync() method creates a new watcher object for the existing ZK 
client.) 

This can cause multiple invocations of joinElectionInternal() for a single 
watch event, each of which will make a call to create the lock znode. The first 
call will cause the a transition to active, while subsequent ones will cause a 
transition to standby (in the isNodeExists clause of the  processResult() 
method). In a manual failover scenario the node will still transition to active 
again, since the other node has ceded from the election for 10s, but it's still 
an unnecessary transition that could be eliminated.

I did some manual testing with the attached patch, and the extra transition was 
avoided. I'll see if I can write a unit test for it.


 Unnecessary transition to standby in ActiveStandbyElector
 -

 Key: HADOOP-9220
 URL: https://issues.apache.org/jira/browse/HADOOP-9220
 Project: Hadoop Common
  Issue Type: Bug
  Components: ha
Reporter: Tom White
Assignee: Tom White
 Attachments: HADOOP-9220.patch


 When performing a manual failover from one HA node to a second, under some 
 circumstances the second node will transition from standby - active - 
 standby - active. This is with automatic failover enabled, so there is a ZK 
 cluster doing leader election.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira