Jeffrey Zhong created HBASE-9360:
------------------------------------

             Summary: Enable 0.94 -> 0.96 replication to minimize upgrade down 
time
                 Key: HBASE-9360
                 URL: https://issues.apache.org/jira/browse/HBASE-9360
             Project: HBase
          Issue Type: Brainstorming
          Components: migration
    Affects Versions: 0.98.0, 0.96.0
            Reporter: Jeffrey Zhong


As we know 0.96 is a singularity release, as of today a 0.94 hbase user has to 
do in-place upgrade: make corresponding client changes, recompile client 
application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 
binary, run upgrade script and then start the upgraded cluster. You can image 
the down time will be extended if something is wrong in between. 

To minimize the down time, another possible way is to setup a secondary 0.96 
cluster and then setup replication between the existing 0.94 cluster and the 
new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch the 
traffic to the 0.96 cluster and decommission the old one.

The ideal steps will be:

1) Setup a 0.96 cluster
2) Setup replication between a running 0.94 cluster to the newly created 0.96 
cluster
3) Wait till they're in sync in replication
4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop 
relocation now)
5) Forward read traffic to the slave 0.96 cluster
6) After a certain period, stop writes to the original 0.94 cluster if 
everything is good and completes upgrade

To get us there, there are two tasks:

1) Enable replication from 0.94 -> 0.96
I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it 
seems the best approach is to build a very similar service or on top of 
https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support 
three commands replicateLogEntries, multi and delete. Inside the three 
commands, we just pass down the corresponding requests to the destination 0.96 
cluster as a bridge. The reason to support the multi and delete is for 
CopyTable to copy data from a 0.94 cluster to a 0.96 one.

The other approach is to provide limited support of 0.94 RPC protocol in 0.96. 
While an issue on this is that a 0.94 client needs to talk to zookeeper firstly 
before it can connect to a 0.96 region server. Therefore, we need a faked 
Zookeeper setup in front of a 0.96 cluster for a 0.94 client to connect. It may 
also pollute 0.96 code base with 0.94 RPC code.

2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need to 
load both hbase clients into one single JVM using different class loader.

Let me know if you think this is worth to do and any better approach we could 
take.

Thanks!


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to