[ 
https://issues.apache.org/jira/browse/HADOOP-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HADOOP-9220:
------------------------------

    Attachment: HADOOP-9220.patch

I've written a test which fails without the patch. Basically it checks that the 
number of times that the HA service transitions to active is as expected.

There is another part to the fix, in addition to the previous patch. In 
ZKFailoverController#recheckElectability() the check may be postponed if the FC 
has ceded its active state and is waiting for a timeout (10s) before rejoining 
the election. The trouble is that the FC may have become active again in the 
intervening time, but recheckElectability() doesn't take account of this (and 
will call ActiveStandbyElector#createLockNodeAsync), and so the FC will 
transition to standby and then to active again. The fix I have implemented 
changes a postponed recheckElectability() to check if the FC is not currently 
active before joining the election.
                
> Unnecessary transition to standby in ActiveStandbyElector
> ---------------------------------------------------------
>
>                 Key: HADOOP-9220
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9220
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: HADOOP-9220.patch, HADOOP-9220.patch
>
>
> When performing a manual failover from one HA node to a second, under some 
> circumstances the second node will transition from standby -> active -> 
> standby -> active. This is with automatic failover enabled, so there is a ZK 
> cluster doing leader election.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to