[ https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762484#comment-13762484 ]
Jeffrey Zhong commented on HBASE-9360: -------------------------------------- [~jdcryans] You mentioned another valid scenario to have 0.94->0.96 replication support. Minimizing the upgrade time(or pain) is just one of the driving factors. Thanks. > Enable 0.94 -> 0.96 replication to minimize upgrade down time > ------------------------------------------------------------- > > Key: HBASE-9360 > URL: https://issues.apache.org/jira/browse/HBASE-9360 > Project: HBase > Issue Type: Brainstorming > Components: migration > Affects Versions: 0.98.0, 0.96.0 > Reporter: Jeffrey Zhong > > As we know 0.96 is a singularity release, as of today a 0.94 hbase user has > to do in-place upgrade: make corresponding client changes, recompile client > application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 > binary, run upgrade script and then start the upgraded cluster. You can image > the down time will be extended if something is wrong in between. > To minimize the down time, another possible way is to setup a secondary 0.96 > cluster and then setup replication between the existing 0.94 cluster and the > new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch > the traffic to the 0.96 cluster and decommission the old one. > The ideal steps will be: > 1) Setup a 0.96 cluster > 2) Setup replication between a running 0.94 cluster to the newly created 0.96 > cluster > 3) Wait till they're in sync in replication > 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop > relocation now) > 5) Forward read traffic to the slave 0.96 cluster > 6) After a certain period, stop writes to the original 0.94 cluster if > everything is good and completes upgrade > To get us there, there are two tasks: > 1) Enable replication from 0.94 -> 0.96 > I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it > seems the best approach is to build a very similar service or on top of > https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support > three commands replicateLogEntries, multi and delete. Inside the three > commands, we just pass down the corresponding requests to the destination > 0.96 cluster as a bridge. The reason to support the multi and delete is for > CopyTable to copy data from a 0.94 cluster to a 0.96 one. > The other approach is to provide limited support of 0.94 RPC protocol in > 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper > firstly before it can connect to a 0.96 region server. Therefore, we need a > faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to > connect. It may also pollute 0.96 code base with 0.94 RPC code. > 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need > to load both hbase clients into one single JVM using different class loader. > Let me know if you think this is worth to do and any better approach we could > take. > Thanks! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira