Well...Since that was mentioned anyway, allow me a tiny correction/clarification.. :)
It's ConsensusNode, not ConsistencyNode, and it's not really custom Paxos implementation, it's more like interface for coordination service atop standard NameNode, which may be backed by any consensus library/algorithm, be it variation of Paxos, ZooKeeper/ZAB, Raft or anything else. The consensus API itself (ConsensusNode code) and ZooKeeper-based implementation of consensus protocol is going to be open-sourced (we're working on it), and once it's out, consensus libraries authors are welcome to start integration with their libs too. Regarding HBase - that's actually what's being developed under HBASE-10909, HBASE-10866 and referenced jiras (everybody interested is welcome to discuss/feedback). -Mikhail 2014-04-07 11:36 GMT-07:00 Enis Söztutar <e...@hortonworks.com>: > Ops sorry this was intented for internal lists. Apologies for any > confusion. > > Enis > > On Monday, April 7, 2014, Enis Söztutar <e...@hortonworks.com> wrote: > > > Me and Devaraj attended their talk on their solution for paxos based > > namenode and HBase replication. > > > > They have two solutions, one for single datacenter, and the other multi > DC > > geo replication. > > > > For the namenode, there is a wrapper, called ConsistencyNode, that > > basically gets the requests, replicate it via their consistency protocol > to > > other CNodes within the DC (paxos based) in the edit log. If the proposal > > for this is accepted, the changes are made durable. However, from my > > understanding, on the read side the client chooses only one replica to > > read. The client decides to connect to one of the replica namenodes, > which > > means that it is not doing a paxos read. I think they also wrapped the > > client, so that if it gets a FileNotFoundException or something similar, > it > > will retry on a different server. Also they track the last seen proposal > id > > as a transaction id for this as well from my understanding (so > > read-what-you-write consistency maybe?). The full details of the > > consistency was not clear to me from the presentation. > > For their multi-DC replication, they are doing a similar thing, but the > > data replication is not handled by paxos, only the namenode metadata. For > > each datacenter, they have a target replication factor (can be set > > differently for each DC, like 0 because of regulatory reasons). The > > metadata of NN is replicated via a similar mechanism. The data > replication > > is async to the metadata replication though. When a block is finalized, > the > > CNode quorum on that particular DC, schedules a remote copy to one of the > > datacenters. That copy job, copies the block with directly writing the > > block from the datanode to a remote datanode. Then that remote DC block > is > > replicated to the target replication by that DC's CNode quorum. When the > > target is reached, that DC will create another proposal about the data > > replication being complete. So the state machine probably contains where > > each data is replicated, but they were still mentioning the client > getting > > DataNotReplicatedException or something. > > > > Their work on HBase is still WIP. I do not remember much details on the > > protocol, except it uses the same replication protocol (their "patented" > > paxos based replication). > > > > Of course the devil is in the details. I did not get that from the > > presentation. > > > > As a side note, Doug when asked, was saying that they are cooking > > something for backups, so maybe their "secret project" also contains > > multi-DC consistent state? > > > > Enis > > > > > > On Sat, Apr 5, 2014 at 1:55 AM, Ted Yu <yuzhih...@gmail.com > <javascript:_e(%7B%7D,'cvml','yuzhih...@gmail.com');> > > > wrote: > > > >> Enis: > >> There was a talk by Konstantin Boudnik< > http://hadoopsummit.org/amsterdam/speakers/#konstantin-boudnik> > >> . > >> > >> Any interesting material from his presentation ? > >> > >> Cheers > >> > > > > > -- Thanks, Michael Antonov