[ https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868800#comment-13868800 ]
Feng Honghua commented on HBASE-10296: -------------------------------------- bq.The google chubby paper goes into some detail about why they implemented a Paxos Service and not a paxos library. I believe google should have a paxos library, which is used in megastore and spanner, right? And this fact is mentioned in a google paper *paxos made live* :-) Implementing paxos as a standalone/shared service or a library has their own benefits and drawbacks. A service: simple API and simple for app to use, can be shared by multiple apps; but abuse by one app can negatively influence other apps using the same paxos service (we ever encountered several times such cases before :-() A library: more difficult for app to use, but have better isolation level(won't be affected by possible abuse from other app), and have more primitives and more flexibility. > Replace ZK with a paxos running within master processes to provide better > master failover performance and state consistency > --------------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-10296 > URL: https://issues.apache.org/jira/browse/HBASE-10296 > Project: HBase > Issue Type: Brainstorming > Components: master, Region Assignment, regionserver > Reporter: Feng Honghua > > Currently master relies on ZK to elect active master, monitor liveness and > store almost all of its states, such as region states, table info, > replication info and so on. And zk also plays as a channel for > master-regionserver communication(such as in region assigning) and > client-regionserver communication(such as replication state/behavior change). > But zk as a communication channel is fragile due to its one-time watch and > asynchronous notification mechanism which together can leads to missed > events(hence missed messages), for example the master must rely on the state > transition logic's idempotence to maintain the region assigning state > machine's correctness, actually almost all of the most tricky inconsistency > issues can trace back their root cause to the fragility of zk as a > communication channel. > Replace zk with paxos running within master processes have following benefits: > 1. better master failover performance: all master, either the active or the > standby ones, have the same latest states in memory(except lag ones but which > can eventually catch up later on). whenever the active master dies, the newly > elected active master can immediately play its role without such failover > work as building its in-memory states by consulting meta-table and zk. > 2. better state consistency: master's in-memory states are the only truth > about the system,which can eliminate inconsistency from the very beginning. > and though the states are contained by all masters, paxos guarantees they are > identical at any time. > 3. more direct and simple communication pattern: client changes state by > sending requests to master, master and regionserver talk directly to each > other by sending request and response...all don't bother to using a > third-party storage like zk which can introduce more uncertainty, worse > latency and more complexity. > 4. zk can only be used as liveness monitoring for determining if a > regionserver is dead, and later on we can eliminate zk totally when we build > heartbeat between master and regionserver. > I know this might looks like a very crazy re-architect, but it deserves deep > thinking and serious discussion for it, right? -- This message was sent by Atlassian JIRA (v6.1.5#6160)