[ 
https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895363#comment-13895363
 ] 

Feng Honghua commented on HBASE-10296:
--------------------------------------

bq.There has been at least 3 proposals so far for a master + assignment rewrite 
in HBASE-5487, and all want to get rid of zk and fix assignment.
Agree, but all those proposals still use third-party storage(from zk to 
auxiliary system table) outside of master processes/machines for persisting 
data such as assign status information, so:
# the new active master needs to read those data from outside third-party 
storage before serving as active master after the previous master dies, hence 
with suboptimal master failover performance.
# the same data/information still be maintained in two different locations: 
master memory and outside third-party storage, hence with potential consistency 
issues

bq.What I was trying to understand is about the deployment...with the 
incremental approach, we might even implement RAFT quorum inside region server 
processes, so that we gradually get rid of the master role as well, and have 
only 1 type of server, where (2n+1) of them would act like masters (while still 
serving data).
Now I can understand what you meant:-). If we take incremental approach then 3 
zk , 3 master and N regionservers, yes it's a suboptimal setup:-(.
If we implement all functionalities that zk provides for HBase such as data 
replicating, master election, liveness monitor and watch/notify and  eliminate 
zk totally, the deployment of a HBase is (3 master + N regionserver)
Though it's workable eventually to concurrently run master and regionserver 
roles within a single server, I'm not a fan of this deployment:
# master and regionserver roles can affect each other, it's hard to 
debug/diagnose when issue arises
# master and regionserver are both memory-consuming, for servers concurrently 
running both roles we need to balance the memory usage, and for servers running 
only regionserver role we need regionserver memory/heap configuration different 
from running both roles to take full advantage of the available memory

> Replace ZK with a consensus lib(paxos,zab or raft) running within master 
> processes to provide better master failover performance and state consistency
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-10296
>                 URL: https://issues.apache.org/jira/browse/HBASE-10296
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: master, Region Assignment, regionserver
>            Reporter: Feng Honghua
>
> Currently master relies on ZK to elect active master, monitor liveness and 
> store almost all of its states, such as region states, table info, 
> replication info and so on. And zk also plays as a channel for 
> master-regionserver communication(such as in region assigning) and 
> client-regionserver communication(such as replication state/behavior change). 
> But zk as a communication channel is fragile due to its one-time watch and 
> asynchronous notification mechanism which together can leads to missed 
> events(hence missed messages), for example the master must rely on the state 
> transition logic's idempotence to maintain the region assigning state 
> machine's correctness, actually almost all of the most tricky inconsistency 
> issues can trace back their root cause to the fragility of zk as a 
> communication channel.
> Replace zk with paxos running within master processes have following benefits:
> 1. better master failover performance: all master, either the active or the 
> standby ones, have the same latest states in memory(except lag ones but which 
> can eventually catch up later on). whenever the active master dies, the newly 
> elected active master can immediately play its role without such failover 
> work as building its in-memory states by consulting meta-table and zk.
> 2. better state consistency: master's in-memory states are the only truth 
> about the system,which can eliminate inconsistency from the very beginning. 
> and though the states are contained by all masters, paxos guarantees they are 
> identical at any time.
> 3. more direct and simple communication pattern: client changes state by 
> sending requests to master, master and regionserver talk directly to each 
> other by sending request and response...all don't bother to using a 
> third-party storage like zk which can introduce more uncertainty, worse 
> latency and more complexity.
> 4. zk can only be used as liveness monitoring for determining if a 
> regionserver is dead, and later on we can eliminate zk totally when we build 
> heartbeat between master and regionserver.
> I know this might looks like a very crazy re-architect, but it deserves deep 
> thinking and serious discussion for it, right?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to