Potential design improvements for ActiveStandbyElector API
----------------------------------------------------------

                 Key: HADOOP-8205
                 URL: https://issues.apache.org/jira/browse/HADOOP-8205
             Project: Hadoop Common
          Issue Type: Improvement
          Components: ha
            Reporter: Todd Lipcon


Bikas suggested some improvements to the API for ActiveStandbyElector in 
HADOOP-8163:
{quote}

I have a feeling that putting the fencing concept into the elector is diluting 
the distinctness between the elector and the failover controller. In my mind, 
the elector is a distributed leader election library that signals candidates 
about being made leader or standby. In the ideal world, where the HA service 
behaves perfectly and does not execute any instruction unless it is a leader, 
we only need the elector. But the world is not ideal and we can have errant 
leader who need to be fenced etc. Here is where the Failover controller comes 
in. It manages the HA service by using the elector to do distributed leader 
selection and get those notifications passed onto the HAservice. In addition is 
guards service sanity by making sure that the signal is passed only when it is 
safe to do so. 
How about this slightly different alternative flow. Elector gets leader lock. 
For all intents and purposes it is the new leader. It passes the signal to the 
failover controller with the breadcrumb of the last leader.
appClient->becomeActive(breadcrumb);
the failoverController now has to ensure that all previous master are fenced 
before making its service the master. the breadcrumb is an optimization that 
lets it know that such an operation may not be necessary. If it is necessary, 
then it performs fencing. If fencing is successful, it calls.
elector->becameActive() or elector->transitionedToActive() at which point the 
elector can overwrite the breadcrumb with its own info. I havent thought 
through if this should be called before or after a successful call to 
HAService->transitionToActive() but my gut feeling is for the former.
This keeps the notion of fencing inside the controller instead of being in both 
the elector and the controller.

Secondly, we are performing blocking calls on the ZKClient callback that 
happens on the ZK threads. It is advisable to not block ZK client threads for 
long. The create and delete methods might be ok but I would try to move the 
fencing operation and transitioning to active operations away from the ZK 
thread. i.e. when the FailoverController is notified about becoming master, it 
returns the call and then processes fencing/transitioning on some other 
thread/threadpool. The above flow allows for this.
{quote}
This JIRA is to further discuss/implement these suggestions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to