[ 
https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13894362#comment-13894362
 ] 

Feng Honghua commented on HBASE-10296:
--------------------------------------

bq.The only thing is that, this would require multi master setup unless we keep 
the logs in hdfs and be able to use a single node RAFT quorum right?
Do you mean master back-compatibility issue? 

No master back-compatibility when replacing zk with embedded consensus lib, nor 
between different incremental phases.

My point when proposing increment approach is that the data/functionalities 
provided by zk have different urgency level to be moved to inside master 
processes, data such as region assignment info has top urgency level to be 
moved inside master processes, configuration-like data has less urgent 
level(which needs additional watch/notify feature), liveness monitoring 
functionality has the least urgent level(which needs additional heart-beat 
feature)...we can remarkably improve master failover performance and eliminate 
inconsistency by replicating region assignment info among master processes' 
memory, the foremost concern/goal of this jira.

To eliminate ZK as a whole to reduce deploying and machine number/roles, as 
[~stack] said "...simplifying hbase packaging and deploy. There would be no 
more need to set aside machines for this special role", we also need to 
implement liveness monitoring(for regionservers) and watch/notify features 
within master processes...

Above features are independent and can be implemented in an incremental 
fashion, that's what I meant by 'incremental', but certainly we can implemented 
them as a whole.

Not sure I understand your question correctly, and wonder whether I answer your 
question, any further clarification is welcome :-)

> Replace ZK with a consensus lib(paxos,zab or raft) running within master 
> processes to provide better master failover performance and state consistency
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-10296
>                 URL: https://issues.apache.org/jira/browse/HBASE-10296
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: master, Region Assignment, regionserver
>            Reporter: Feng Honghua
>
> Currently master relies on ZK to elect active master, monitor liveness and 
> store almost all of its states, such as region states, table info, 
> replication info and so on. And zk also plays as a channel for 
> master-regionserver communication(such as in region assigning) and 
> client-regionserver communication(such as replication state/behavior change). 
> But zk as a communication channel is fragile due to its one-time watch and 
> asynchronous notification mechanism which together can leads to missed 
> events(hence missed messages), for example the master must rely on the state 
> transition logic's idempotence to maintain the region assigning state 
> machine's correctness, actually almost all of the most tricky inconsistency 
> issues can trace back their root cause to the fragility of zk as a 
> communication channel.
> Replace zk with paxos running within master processes have following benefits:
> 1. better master failover performance: all master, either the active or the 
> standby ones, have the same latest states in memory(except lag ones but which 
> can eventually catch up later on). whenever the active master dies, the newly 
> elected active master can immediately play its role without such failover 
> work as building its in-memory states by consulting meta-table and zk.
> 2. better state consistency: master's in-memory states are the only truth 
> about the system,which can eliminate inconsistency from the very beginning. 
> and though the states are contained by all masters, paxos guarantees they are 
> identical at any time.
> 3. more direct and simple communication pattern: client changes state by 
> sending requests to master, master and regionserver talk directly to each 
> other by sending request and response...all don't bother to using a 
> third-party storage like zk which can introduce more uncertainty, worse 
> latency and more complexity.
> 4. zk can only be used as liveness monitoring for determining if a 
> regionserver is dead, and later on we can eliminate zk totally when we build 
> heartbeat between master and regionserver.
> I know this might looks like a very crazy re-architect, but it deserves deep 
> thinking and serious discussion for it, right?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to