[ https://issues.apache.org/jira/browse/HBASE-12814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316600#comment-14316600 ]
Andrew Purtell edited comment on HBASE-12814 at 2/11/15 5:34 PM: ----------------------------------------------------------------- What do people think about making this a pluggable replication endpoint implementation option in its own Maven module? I think that would be a short path to commit since it side-steps a lot of the issues raised in my previous comment. Edit: Obviously this works for 0.98+. For 0.94, it's a separate question. was (Author: apurtell): What do people think about making this a pluggable replication endpoint implementation option in its own Maven module? I think that would be a short path to commit since it side-steps a lot of the issues raised in my previous comment. > Zero downtime upgrade from 94 to 98 > ------------------------------------ > > Key: HBASE-12814 > URL: https://issues.apache.org/jira/browse/HBASE-12814 > Project: HBase > Issue Type: New Feature > Affects Versions: 0.94.26, 0.98.10 > Reporter: churro morales > Assignee: churro morales > Attachments: HBASE-12814-0.94.patch, HBASE-12814-0.98.patch > > > Here at Flurry we want to upgrade our HBase cluster from 94 to 98 while not > having any downtime and maintaining master / master replication. > Summary: > Replication is done via thrift RPC between clusters. It is configurable on a > peer by peer basis and the one caveat is that a thrift server starts up on > every node which proxies the request to the ReplicationSink. > For the upgrade process: > * in hbase-site.xml two new configuration parameters are added: > ** *Required* > *** hbase.replication.sink.enable.thrift -> true > *** hbase.replication.thrift.server.port -> <thrit_server_port> > ** *Optional* > *** hbase.replication.thrift.protection {default: AUTHENTICATION} > *** hbase.replication.thrift.framed {default: false} > *** hbase.replication.thrift.compact {default: true} > - All regionservers can be rolling restarted (no downtime), all clusters must > have the respective patch for this to work. > - the hbase shell add_peer command takes an additional parameter for rpc > protocol > - example: {code} add_peer '1' "hbase-101:2181:/hbase", "THRIFT" {code} > Now comes the fun part when you want to upgrade your cluster from 94 to 98 > you simply pause replication to the cluster being upgraded, do the upgrade > and un-pause replication. Once you have a pair of clusters only replicating > inbound and outbound with the 98 release. You can start replicating via the > native rpc protocol by adding the peer again without the _THRIFT_ parameter > and subsequently deleting the peer with the thrift protocol. Because > replication is idempotent I don't see any issues as long as you wait for the > backlog to drain after un-pausing replication. > Special thanks to Francis Liu at Yahoo for laying the groundwork and Mr. Dave > Latham for his invaluable knowledge and assistance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)