Feng Honghua created HBASE-10296:
------------------------------------

             Summary: Replace ZK with a paxos running within master processes 
to provide better master failover performance and state consistency
                 Key: HBASE-10296
                 URL: https://issues.apache.org/jira/browse/HBASE-10296
             Project: HBase
          Issue Type: Brainstorming
          Components: master, Region Assignment, regionserver
            Reporter: Feng Honghua


Currently master relies on ZK to elect active master, monitor liveness and 
store almost all of its states, such as region states, table info, replication 
info and so on. And zk also plays as a channel for master-regionserver 
communication(such as in region assigning) and client-regionserver 
communication(such as replication state/behavior change). 
But zk as a communication channel is fragile due to its one-time watch and 
asynchronous notification mechanism which together can leads to missed 
events(hence missed messages), for example the master must rely on the state 
transition logic's idempotence to maintain the region assigning state machine's 
correctness, actually almost all of the most tricky inconsistency issues can 
trace back their root cause to the fragility of zk as a communication channel.
Replace zk with paxos running within master processes have following benefits:
1. better master failover performance: all master, either the active or the 
standby ones, have the same latest states in memory(except lag ones but which 
can eventually catch up later on). whenever the active master dies, the newly 
elected active master can immediately play its role without such failover work 
as building its in-memory states by consulting meta-table and zk.
2. better state consistency: master's in-memory states are the only truth about 
the system,which can eliminate inconsistency from the very beginning. and 
though the states are contained by all masters, paxos guarantees they are 
identical at any time.
3. more direct and simple communication pattern: client changes state by 
sending requests to master, master and regionserver talk directly to each other 
by sending request and response...all don't bother to using a third-party 
storage like zk which can introduce more uncertainty, worse latency and more 
complexity.
4. zk can only be used as liveness monitoring for determining if a regionserver 
is dead, and later on we can eliminate zk totally when we build heartbeat 
between master and regionserver.

I know this might looks like a very crazy re-architect, but it deserves deep 
thinking and serious discussion for it, right?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to