[jira] [Commented] (HBASE-9360) Enable 0.94 -> 0.96 replication to minimize upgrade down time

2014-02-12 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899680#comment-13899680
 ] 

Andrew Purtell commented on HBASE-9360:
---

I would be interested in forward porting something that showed up for 0.96 to 
0.98.

> Enable 0.94 -> 0.96 replication to minimize upgrade down time
> -
>
> Key: HBASE-9360
> URL: https://issues.apache.org/jira/browse/HBASE-9360
> Project: HBase
>  Issue Type: Brainstorming
>  Components: migration
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Jeffrey Zhong
>
> As we know 0.96 is a singularity release, as of today a 0.94 hbase user has 
> to do in-place upgrade: make corresponding client changes, recompile client 
> application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 
> binary, run upgrade script and then start the upgraded cluster. You can image 
> the down time will be extended if something is wrong in between. 
> To minimize the down time, another possible way is to setup a secondary 0.96 
> cluster and then setup replication between the existing 0.94 cluster and the 
> new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch 
> the traffic to the 0.96 cluster and decommission the old one.
> The ideal steps will be:
> 1) Setup a 0.96 cluster
> 2) Setup replication between a running 0.94 cluster to the newly created 0.96 
> cluster
> 3) Wait till they're in sync in replication
> 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop 
> relocation now)
> 5) Forward read traffic to the slave 0.96 cluster
> 6) After a certain period, stop writes to the original 0.94 cluster if 
> everything is good and completes upgrade
> To get us there, there are two tasks:
> 1) Enable replication from 0.94 -> 0.96
> I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it 
> seems the best approach is to build a very similar service or on top of 
> https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support 
> three commands replicateLogEntries, multi and delete. Inside the three 
> commands, we just pass down the corresponding requests to the destination 
> 0.96 cluster as a bridge. The reason to support the multi and delete is for 
> CopyTable to copy data from a 0.94 cluster to a 0.96 one.
> The other approach is to provide limited support of 0.94 RPC protocol in 
> 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper 
> firstly before it can connect to a 0.96 region server. Therefore, we need a 
> faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to 
> connect. It may also pollute 0.96 code base with 0.94 RPC code.
> 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need 
> to load both hbase clients into one single JVM using different class loader.
> Let me know if you think this is worth to do and any better approach we could 
> take.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9360) Enable 0.94 -> 0.96 replication to minimize upgrade down time

2014-02-11 Thread Francis Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898431#comment-13898431
 ] 

Francis Liu commented on HBASE-9360:


Sorry late to the party. [~saint@gmail.com] mentioned this. We have a 
different approach. We instead extended replication source/sink to use a thrift 
client/server to ship/receive the edits. We plan on using it for 0.94<->0.96 
replication as well as encrypting replication communication. We currently have 
0.94<->0.94 next step is 0.94<->0.96. If people are interested I can try and 
share the code once we have things stable tho 0.94<->0.96 might be a bit later.

> Enable 0.94 -> 0.96 replication to minimize upgrade down time
> -
>
> Key: HBASE-9360
> URL: https://issues.apache.org/jira/browse/HBASE-9360
> Project: HBase
>  Issue Type: Brainstorming
>  Components: migration
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Jeffrey Zhong
>
> As we know 0.96 is a singularity release, as of today a 0.94 hbase user has 
> to do in-place upgrade: make corresponding client changes, recompile client 
> application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 
> binary, run upgrade script and then start the upgraded cluster. You can image 
> the down time will be extended if something is wrong in between. 
> To minimize the down time, another possible way is to setup a secondary 0.96 
> cluster and then setup replication between the existing 0.94 cluster and the 
> new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch 
> the traffic to the 0.96 cluster and decommission the old one.
> The ideal steps will be:
> 1) Setup a 0.96 cluster
> 2) Setup replication between a running 0.94 cluster to the newly created 0.96 
> cluster
> 3) Wait till they're in sync in replication
> 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop 
> relocation now)
> 5) Forward read traffic to the slave 0.96 cluster
> 6) After a certain period, stop writes to the original 0.94 cluster if 
> everything is good and completes upgrade
> To get us there, there are two tasks:
> 1) Enable replication from 0.94 -> 0.96
> I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it 
> seems the best approach is to build a very similar service or on top of 
> https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support 
> three commands replicateLogEntries, multi and delete. Inside the three 
> commands, we just pass down the corresponding requests to the destination 
> 0.96 cluster as a bridge. The reason to support the multi and delete is for 
> CopyTable to copy data from a 0.94 cluster to a 0.96 one.
> The other approach is to provide limited support of 0.94 RPC protocol in 
> 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper 
> firstly before it can connect to a 0.96 region server. Therefore, we need a 
> faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to 
> connect. It may also pollute 0.96 code base with 0.94 RPC code.
> 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need 
> to load both hbase clients into one single JVM using different class loader.
> Let me know if you think this is worth to do and any better approach we could 
> take.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9360) Enable 0.94 -> 0.96 replication to minimize upgrade down time

2014-01-17 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875043#comment-13875043
 ] 

Jeffrey Zhong commented on HBASE-9360:
--

Good suggestions. Let me add a paragraph on this in the ref guide in upgrade & 
replication.

> Enable 0.94 -> 0.96 replication to minimize upgrade down time
> -
>
> Key: HBASE-9360
> URL: https://issues.apache.org/jira/browse/HBASE-9360
> Project: HBase
>  Issue Type: Brainstorming
>  Components: migration
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Jeffrey Zhong
>
> As we know 0.96 is a singularity release, as of today a 0.94 hbase user has 
> to do in-place upgrade: make corresponding client changes, recompile client 
> application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 
> binary, run upgrade script and then start the upgraded cluster. You can image 
> the down time will be extended if something is wrong in between. 
> To minimize the down time, another possible way is to setup a secondary 0.96 
> cluster and then setup replication between the existing 0.94 cluster and the 
> new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch 
> the traffic to the 0.96 cluster and decommission the old one.
> The ideal steps will be:
> 1) Setup a 0.96 cluster
> 2) Setup replication between a running 0.94 cluster to the newly created 0.96 
> cluster
> 3) Wait till they're in sync in replication
> 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop 
> relocation now)
> 5) Forward read traffic to the slave 0.96 cluster
> 6) After a certain period, stop writes to the original 0.94 cluster if 
> everything is good and completes upgrade
> To get us there, there are two tasks:
> 1) Enable replication from 0.94 -> 0.96
> I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it 
> seems the best approach is to build a very similar service or on top of 
> https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support 
> three commands replicateLogEntries, multi and delete. Inside the three 
> commands, we just pass down the corresponding requests to the destination 
> 0.96 cluster as a bridge. The reason to support the multi and delete is for 
> CopyTable to copy data from a 0.94 cluster to a 0.96 one.
> The other approach is to provide limited support of 0.94 RPC protocol in 
> 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper 
> firstly before it can connect to a 0.96 region server. Therefore, we need a 
> faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to 
> connect. It may also pollute 0.96 code base with 0.94 RPC code.
> 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need 
> to load both hbase clients into one single JVM using different class loader.
> Let me know if you think this is worth to do and any better approach we could 
> take.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9360) Enable 0.94 -> 0.96 replication to minimize upgrade down time

2014-01-16 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874355#comment-13874355
 ] 

Nick Dimiduk commented on HBASE-9360:
-

A new section in http://hbase.apache.org/book.html#upgrade0.96 plus a note on 
user@ should do the trick.

> Enable 0.94 -> 0.96 replication to minimize upgrade down time
> -
>
> Key: HBASE-9360
> URL: https://issues.apache.org/jira/browse/HBASE-9360
> Project: HBase
>  Issue Type: Brainstorming
>  Components: migration
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Jeffrey Zhong
>
> As we know 0.96 is a singularity release, as of today a 0.94 hbase user has 
> to do in-place upgrade: make corresponding client changes, recompile client 
> application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 
> binary, run upgrade script and then start the upgraded cluster. You can image 
> the down time will be extended if something is wrong in between. 
> To minimize the down time, another possible way is to setup a secondary 0.96 
> cluster and then setup replication between the existing 0.94 cluster and the 
> new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch 
> the traffic to the 0.96 cluster and decommission the old one.
> The ideal steps will be:
> 1) Setup a 0.96 cluster
> 2) Setup replication between a running 0.94 cluster to the newly created 0.96 
> cluster
> 3) Wait till they're in sync in replication
> 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop 
> relocation now)
> 5) Forward read traffic to the slave 0.96 cluster
> 6) After a certain period, stop writes to the original 0.94 cluster if 
> everything is good and completes upgrade
> To get us there, there are two tasks:
> 1) Enable replication from 0.94 -> 0.96
> I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it 
> seems the best approach is to build a very similar service or on top of 
> https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support 
> three commands replicateLogEntries, multi and delete. Inside the three 
> commands, we just pass down the corresponding requests to the destination 
> 0.96 cluster as a bridge. The reason to support the multi and delete is for 
> CopyTable to copy data from a 0.94 cluster to a 0.96 one.
> The other approach is to provide limited support of 0.94 RPC protocol in 
> 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper 
> firstly before it can connect to a 0.96 region server. Therefore, we need a 
> faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to 
> connect. It may also pollute 0.96 code base with 0.94 RPC code.
> 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need 
> to load both hbase clients into one single JVM using different class loader.
> Let me know if you think this is worth to do and any better approach we could 
> take.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9360) Enable 0.94 -> 0.96 replication to minimize upgrade down time

2014-01-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874235#comment-13874235
 ] 

stack commented on HBASE-9360:
--

Should at least doc its existence in refguide?  Or play a loud trumpet so a 
bunch get to hear about it (Or post on user list?)

> Enable 0.94 -> 0.96 replication to minimize upgrade down time
> -
>
> Key: HBASE-9360
> URL: https://issues.apache.org/jira/browse/HBASE-9360
> Project: HBase
>  Issue Type: Brainstorming
>  Components: migration
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Jeffrey Zhong
>
> As we know 0.96 is a singularity release, as of today a 0.94 hbase user has 
> to do in-place upgrade: make corresponding client changes, recompile client 
> application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 
> binary, run upgrade script and then start the upgraded cluster. You can image 
> the down time will be extended if something is wrong in between. 
> To minimize the down time, another possible way is to setup a secondary 0.96 
> cluster and then setup replication between the existing 0.94 cluster and the 
> new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch 
> the traffic to the 0.96 cluster and decommission the old one.
> The ideal steps will be:
> 1) Setup a 0.96 cluster
> 2) Setup replication between a running 0.94 cluster to the newly created 0.96 
> cluster
> 3) Wait till they're in sync in replication
> 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop 
> relocation now)
> 5) Forward read traffic to the slave 0.96 cluster
> 6) After a certain period, stop writes to the original 0.94 cluster if 
> everything is good and completes upgrade
> To get us there, there are two tasks:
> 1) Enable replication from 0.94 -> 0.96
> I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it 
> seems the best approach is to build a very similar service or on top of 
> https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support 
> three commands replicateLogEntries, multi and delete. Inside the three 
> commands, we just pass down the corresponding requests to the destination 
> 0.96 cluster as a bridge. The reason to support the multi and delete is for 
> CopyTable to copy data from a 0.94 cluster to a 0.96 one.
> The other approach is to provide limited support of 0.94 RPC protocol in 
> 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper 
> firstly before it can connect to a 0.96 region server. Therefore, we need a 
> faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to 
> connect. It may also pollute 0.96 code base with 0.94 RPC code.
> 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need 
> to load both hbase clients into one single JVM using different class loader.
> Let me know if you think this is worth to do and any better approach we could 
> take.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9360) Enable 0.94 -> 0.96 replication to minimize upgrade down time

2014-01-16 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874048#comment-13874048
 ] 

Nick Dimiduk commented on HBASE-9360:
-

Should this ticket be resolved then? Is this code we can put in a contrib 
directory, similar to our dev-support directory, or are we happy with it in an 
external repo?

> Enable 0.94 -> 0.96 replication to minimize upgrade down time
> -
>
> Key: HBASE-9360
> URL: https://issues.apache.org/jira/browse/HBASE-9360
> Project: HBase
>  Issue Type: Brainstorming
>  Components: migration
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Jeffrey Zhong
>
> As we know 0.96 is a singularity release, as of today a 0.94 hbase user has 
> to do in-place upgrade: make corresponding client changes, recompile client 
> application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 
> binary, run upgrade script and then start the upgraded cluster. You can image 
> the down time will be extended if something is wrong in between. 
> To minimize the down time, another possible way is to setup a secondary 0.96 
> cluster and then setup replication between the existing 0.94 cluster and the 
> new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch 
> the traffic to the 0.96 cluster and decommission the old one.
> The ideal steps will be:
> 1) Setup a 0.96 cluster
> 2) Setup replication between a running 0.94 cluster to the newly created 0.96 
> cluster
> 3) Wait till they're in sync in replication
> 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop 
> relocation now)
> 5) Forward read traffic to the slave 0.96 cluster
> 6) After a certain period, stop writes to the original 0.94 cluster if 
> everything is good and completes upgrade
> To get us there, there are two tasks:
> 1) Enable replication from 0.94 -> 0.96
> I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it 
> seems the best approach is to build a very similar service or on top of 
> https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support 
> three commands replicateLogEntries, multi and delete. Inside the three 
> commands, we just pass down the corresponding requests to the destination 
> 0.96 cluster as a bridge. The reason to support the multi and delete is for 
> CopyTable to copy data from a 0.94 cluster to a 0.96 one.
> The other approach is to provide limited support of 0.94 RPC protocol in 
> 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper 
> firstly before it can connect to a 0.96 region server. Therefore, we need a 
> faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to 
> connect. It may also pollute 0.96 code base with 0.94 RPC code.
> 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need 
> to load both hbase clients into one single JVM using different class loader.
> Let me know if you think this is worth to do and any better approach we could 
> take.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9360) Enable 0.94 -> 0.96 replication to minimize upgrade down time

2014-01-16 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873939#comment-13873939
 ] 

Jeffrey Zhong commented on HBASE-9360:
--

The work is done. The 
code(https://github.com/hortonworks/HBaseReplicationBridgeServer) can't be 
checked into HBase code because it will mess 0.96 code base with 0.94 RPC code 
though I'd love to. If you want to try out replication between 0.94->0.96, you 
can follow the README @ 
https://github.com/hortonworks/HBaseReplicationBridgeServer to see if it can 
work for you. 

Thanks.

> Enable 0.94 -> 0.96 replication to minimize upgrade down time
> -
>
> Key: HBASE-9360
> URL: https://issues.apache.org/jira/browse/HBASE-9360
> Project: HBase
>  Issue Type: Brainstorming
>  Components: migration
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Jeffrey Zhong
>
> As we know 0.96 is a singularity release, as of today a 0.94 hbase user has 
> to do in-place upgrade: make corresponding client changes, recompile client 
> application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 
> binary, run upgrade script and then start the upgraded cluster. You can image 
> the down time will be extended if something is wrong in between. 
> To minimize the down time, another possible way is to setup a secondary 0.96 
> cluster and then setup replication between the existing 0.94 cluster and the 
> new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch 
> the traffic to the 0.96 cluster and decommission the old one.
> The ideal steps will be:
> 1) Setup a 0.96 cluster
> 2) Setup replication between a running 0.94 cluster to the newly created 0.96 
> cluster
> 3) Wait till they're in sync in replication
> 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop 
> relocation now)
> 5) Forward read traffic to the slave 0.96 cluster
> 6) After a certain period, stop writes to the original 0.94 cluster if 
> everything is good and completes upgrade
> To get us there, there are two tasks:
> 1) Enable replication from 0.94 -> 0.96
> I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it 
> seems the best approach is to build a very similar service or on top of 
> https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support 
> three commands replicateLogEntries, multi and delete. Inside the three 
> commands, we just pass down the corresponding requests to the destination 
> 0.96 cluster as a bridge. The reason to support the multi and delete is for 
> CopyTable to copy data from a 0.94 cluster to a 0.96 one.
> The other approach is to provide limited support of 0.94 RPC protocol in 
> 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper 
> firstly before it can connect to a 0.96 region server. Therefore, we need a 
> faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to 
> connect. It may also pollute 0.96 code base with 0.94 RPC code.
> 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need 
> to load both hbase clients into one single JVM using different class loader.
> Let me know if you think this is worth to do and any better approach we could 
> take.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9360) Enable 0.94 -> 0.96 replication to minimize upgrade down time

2014-01-16 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873476#comment-13873476
 ] 

Jean-Marc Spaggiari commented on HBASE-9360:


Hi there,

any progress on this JIRA? I still have 2 clusters (One 0.94, one 0.96) that I 
can use to try that...

JM

> Enable 0.94 -> 0.96 replication to minimize upgrade down time
> -
>
> Key: HBASE-9360
> URL: https://issues.apache.org/jira/browse/HBASE-9360
> Project: HBase
>  Issue Type: Brainstorming
>  Components: migration
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Jeffrey Zhong
>
> As we know 0.96 is a singularity release, as of today a 0.94 hbase user has 
> to do in-place upgrade: make corresponding client changes, recompile client 
> application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 
> binary, run upgrade script and then start the upgraded cluster. You can image 
> the down time will be extended if something is wrong in between. 
> To minimize the down time, another possible way is to setup a secondary 0.96 
> cluster and then setup replication between the existing 0.94 cluster and the 
> new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch 
> the traffic to the 0.96 cluster and decommission the old one.
> The ideal steps will be:
> 1) Setup a 0.96 cluster
> 2) Setup replication between a running 0.94 cluster to the newly created 0.96 
> cluster
> 3) Wait till they're in sync in replication
> 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop 
> relocation now)
> 5) Forward read traffic to the slave 0.96 cluster
> 6) After a certain period, stop writes to the original 0.94 cluster if 
> everything is good and completes upgrade
> To get us there, there are two tasks:
> 1) Enable replication from 0.94 -> 0.96
> I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it 
> seems the best approach is to build a very similar service or on top of 
> https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support 
> three commands replicateLogEntries, multi and delete. Inside the three 
> commands, we just pass down the corresponding requests to the destination 
> 0.96 cluster as a bridge. The reason to support the multi and delete is for 
> CopyTable to copy data from a 0.94 cluster to a 0.96 one.
> The other approach is to provide limited support of 0.94 RPC protocol in 
> 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper 
> firstly before it can connect to a 0.96 region server. Therefore, we need a 
> faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to 
> connect. It may also pollute 0.96 code base with 0.94 RPC code.
> 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need 
> to load both hbase clients into one single JVM using different class loader.
> Let me know if you think this is worth to do and any better approach we could 
> take.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9360) Enable 0.94 -> 0.96 replication to minimize upgrade down time

2013-11-12 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820278#comment-13820278
 ] 

Jeffrey Zhong commented on HBASE-9360:
--

Thanks [~jmspaggi]! Very nice of you to try it out. For your question:
{quote}
if writes are happening in 0.94, how are your perfectly sure of the start time 
of the replication?
{quote}
This is same as normal replication setup. After turning on replication for all 
tables in source 0.94 cluster, let's say the timestamp is T1. When you export 
data from source 0.94 cluster, you can export data till T1+15secs(some buffer 
to overlap time clock drift). It means the replication and import have a small 
time window overlap which guarantees all the data are copied over. The overlap 
is safe because replication & import only use puts & deletes, which are 
idempotent, to copy data. 


> Enable 0.94 -> 0.96 replication to minimize upgrade down time
> -
>
> Key: HBASE-9360
> URL: https://issues.apache.org/jira/browse/HBASE-9360
> Project: HBase
>  Issue Type: Brainstorming
>  Components: migration
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Jeffrey Zhong
>
> As we know 0.96 is a singularity release, as of today a 0.94 hbase user has 
> to do in-place upgrade: make corresponding client changes, recompile client 
> application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 
> binary, run upgrade script and then start the upgraded cluster. You can image 
> the down time will be extended if something is wrong in between. 
> To minimize the down time, another possible way is to setup a secondary 0.96 
> cluster and then setup replication between the existing 0.94 cluster and the 
> new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch 
> the traffic to the 0.96 cluster and decommission the old one.
> The ideal steps will be:
> 1) Setup a 0.96 cluster
> 2) Setup replication between a running 0.94 cluster to the newly created 0.96 
> cluster
> 3) Wait till they're in sync in replication
> 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop 
> relocation now)
> 5) Forward read traffic to the slave 0.96 cluster
> 6) After a certain period, stop writes to the original 0.94 cluster if 
> everything is good and completes upgrade
> To get us there, there are two tasks:
> 1) Enable replication from 0.94 -> 0.96
> I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it 
> seems the best approach is to build a very similar service or on top of 
> https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support 
> three commands replicateLogEntries, multi and delete. Inside the three 
> commands, we just pass down the corresponding requests to the destination 
> 0.96 cluster as a bridge. The reason to support the multi and delete is for 
> CopyTable to copy data from a 0.94 cluster to a 0.96 one.
> The other approach is to provide limited support of 0.94 RPC protocol in 
> 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper 
> firstly before it can connect to a 0.96 region server. Therefore, we need a 
> faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to 
> connect. It may also pollute 0.96 code base with 0.94 RPC code.
> 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need 
> to load both hbase clients into one single JVM using different class loader.
> Let me know if you think this is worth to do and any better approach we could 
> take.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9360) Enable 0.94 -> 0.96 replication to minimize upgrade down time

2013-11-12 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820048#comment-13820048
 ] 

Jean-Marc Spaggiari commented on HBASE-9360:


Hi [~jeffreyz], I have a 0.94.12+1.0.3 and a 0.96.0+2.2.0 clusters. I will most 
probably be able to give this a try. We can move the discussion to he mailing 
list if you want. So far I have done stopped 0.94, a distcp from 0.94 to 0.96, 
started 0.96, as a test since I'm still using 0.94 (Need to try my MR jobs in 
0.96 first). So I can clean 0.96 and give a try to the procedure above.

Only question is, if writes are happening in 0.94, how are your perfectly sure 
of the start time of the replication? Might be one ms later of before what you 
think it is. Is this information stored somewhere?

> Enable 0.94 -> 0.96 replication to minimize upgrade down time
> -
>
> Key: HBASE-9360
> URL: https://issues.apache.org/jira/browse/HBASE-9360
> Project: HBase
>  Issue Type: Brainstorming
>  Components: migration
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Jeffrey Zhong
>
> As we know 0.96 is a singularity release, as of today a 0.94 hbase user has 
> to do in-place upgrade: make corresponding client changes, recompile client 
> application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 
> binary, run upgrade script and then start the upgraded cluster. You can image 
> the down time will be extended if something is wrong in between. 
> To minimize the down time, another possible way is to setup a secondary 0.96 
> cluster and then setup replication between the existing 0.94 cluster and the 
> new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch 
> the traffic to the 0.96 cluster and decommission the old one.
> The ideal steps will be:
> 1) Setup a 0.96 cluster
> 2) Setup replication between a running 0.94 cluster to the newly created 0.96 
> cluster
> 3) Wait till they're in sync in replication
> 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop 
> relocation now)
> 5) Forward read traffic to the slave 0.96 cluster
> 6) After a certain period, stop writes to the original 0.94 cluster if 
> everything is good and completes upgrade
> To get us there, there are two tasks:
> 1) Enable replication from 0.94 -> 0.96
> I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it 
> seems the best approach is to build a very similar service or on top of 
> https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support 
> three commands replicateLogEntries, multi and delete. Inside the three 
> commands, we just pass down the corresponding requests to the destination 
> 0.96 cluster as a bridge. The reason to support the multi and delete is for 
> CopyTable to copy data from a 0.94 cluster to a 0.96 one.
> The other approach is to provide limited support of 0.94 RPC protocol in 
> 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper 
> firstly before it can connect to a 0.96 region server. Therefore, we need a 
> faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to 
> connect. It may also pollute 0.96 code base with 0.94 RPC code.
> 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need 
> to load both hbase clients into one single JVM using different class loader.
> Let me know if you think this is worth to do and any better approach we could 
> take.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9360) Enable 0.94 -> 0.96 replication to minimize upgrade down time

2013-11-11 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819784#comment-13819784
 ] 

Jeffrey Zhong commented on HBASE-9360:
--

With hbase-9895 checked in, a user can set up replication from 0.94 to 0.96 
hbase cluster as following:

1) setup HBaseReplicationBridgeServer
2) setup replication between source hbase0.94 cluster and destination 
HBaseReplicationBridgeServer instances & mark the time
3) export data from source 0.94 cluster till the time replication is setup
4) import exported data into 0.96 cluster from previous step with the 
replication is on

Thanks.

> Enable 0.94 -> 0.96 replication to minimize upgrade down time
> -
>
> Key: HBASE-9360
> URL: https://issues.apache.org/jira/browse/HBASE-9360
> Project: HBase
>  Issue Type: Brainstorming
>  Components: migration
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Jeffrey Zhong
>
> As we know 0.96 is a singularity release, as of today a 0.94 hbase user has 
> to do in-place upgrade: make corresponding client changes, recompile client 
> application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 
> binary, run upgrade script and then start the upgraded cluster. You can image 
> the down time will be extended if something is wrong in between. 
> To minimize the down time, another possible way is to setup a secondary 0.96 
> cluster and then setup replication between the existing 0.94 cluster and the 
> new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch 
> the traffic to the 0.96 cluster and decommission the old one.
> The ideal steps will be:
> 1) Setup a 0.96 cluster
> 2) Setup replication between a running 0.94 cluster to the newly created 0.96 
> cluster
> 3) Wait till they're in sync in replication
> 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop 
> relocation now)
> 5) Forward read traffic to the slave 0.96 cluster
> 6) After a certain period, stop writes to the original 0.94 cluster if 
> everything is good and completes upgrade
> To get us there, there are two tasks:
> 1) Enable replication from 0.94 -> 0.96
> I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it 
> seems the best approach is to build a very similar service or on top of 
> https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support 
> three commands replicateLogEntries, multi and delete. Inside the three 
> commands, we just pass down the corresponding requests to the destination 
> 0.96 cluster as a bridge. The reason to support the multi and delete is for 
> CopyTable to copy data from a 0.94 cluster to a 0.96 one.
> The other approach is to provide limited support of 0.94 RPC protocol in 
> 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper 
> firstly before it can connect to a 0.96 region server. Therefore, we need a 
> faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to 
> connect. It may also pollute 0.96 code base with 0.94 RPC code.
> 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need 
> to load both hbase clients into one single JVM using different class loader.
> Let me know if you think this is worth to do and any better approach we could 
> take.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9360) Enable 0.94 -> 0.96 replication to minimize upgrade down time

2013-10-30 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809858#comment-13809858
 ] 

Jeffrey Zhong commented on HBASE-9360:
--

I did a prototype @https://github.com/hortonworks/HBaseReplicationBridgeServer 
and tested replication from a 0.94 cluster to a 0.96 cluster.

The remaining work is to bring the slave cluster to a good base(which we're 
currently using CopyTable) when setting up replication.  

h6. Support import from a 0.94 export sequence file. 
Currently we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
cannot import 0.94 exported files while we can easily add that support. 
(personal preferred option)

h6. Use snapshot without any code changes:
1) Setup replication without starting replication bridge server so source 
cluster is queuing WALs
2) Use Snapshot to bring destination cluster to a good base
3) Starts replication bridge servers to drain WALs queued up from step 1

h6. Enable CopyTable against Replication Bridge Server
This option is least desired one because it will involves significant code 
changes to have a faked root znode, support root table scan, support meta table 
scan, delete and multi command in replication bridge server.



> Enable 0.94 -> 0.96 replication to minimize upgrade down time
> -
>
> Key: HBASE-9360
> URL: https://issues.apache.org/jira/browse/HBASE-9360
> Project: HBase
>  Issue Type: Brainstorming
>  Components: migration
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Jeffrey Zhong
>
> As we know 0.96 is a singularity release, as of today a 0.94 hbase user has 
> to do in-place upgrade: make corresponding client changes, recompile client 
> application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 
> binary, run upgrade script and then start the upgraded cluster. You can image 
> the down time will be extended if something is wrong in between. 
> To minimize the down time, another possible way is to setup a secondary 0.96 
> cluster and then setup replication between the existing 0.94 cluster and the 
> new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch 
> the traffic to the 0.96 cluster and decommission the old one.
> The ideal steps will be:
> 1) Setup a 0.96 cluster
> 2) Setup replication between a running 0.94 cluster to the newly created 0.96 
> cluster
> 3) Wait till they're in sync in replication
> 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop 
> relocation now)
> 5) Forward read traffic to the slave 0.96 cluster
> 6) After a certain period, stop writes to the original 0.94 cluster if 
> everything is good and completes upgrade
> To get us there, there are two tasks:
> 1) Enable replication from 0.94 -> 0.96
> I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it 
> seems the best approach is to build a very similar service or on top of 
> https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support 
> three commands replicateLogEntries, multi and delete. Inside the three 
> commands, we just pass down the corresponding requests to the destination 
> 0.96 cluster as a bridge. The reason to support the multi and delete is for 
> CopyTable to copy data from a 0.94 cluster to a 0.96 one.
> The other approach is to provide limited support of 0.94 RPC protocol in 
> 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper 
> firstly before it can connect to a 0.96 region server. Therefore, we need a 
> faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to 
> connect. It may also pollute 0.96 code base with 0.94 RPC code.
> 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need 
> to load both hbase clients into one single JVM using different class loader.
> Let me know if you think this is worth to do and any better approach we could 
> take.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9360) Enable 0.94 -> 0.96 replication to minimize upgrade down time

2013-09-09 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762484#comment-13762484
 ] 

Jeffrey Zhong commented on HBASE-9360:
--

[~jdcryans] You mentioned another valid scenario to have 0.94->0.96 replication 
support. Minimizing the upgrade time(or pain) is just one of the driving 
factors. Thanks.

> Enable 0.94 -> 0.96 replication to minimize upgrade down time
> -
>
> Key: HBASE-9360
> URL: https://issues.apache.org/jira/browse/HBASE-9360
> Project: HBase
>  Issue Type: Brainstorming
>  Components: migration
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Jeffrey Zhong
>
> As we know 0.96 is a singularity release, as of today a 0.94 hbase user has 
> to do in-place upgrade: make corresponding client changes, recompile client 
> application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 
> binary, run upgrade script and then start the upgraded cluster. You can image 
> the down time will be extended if something is wrong in between. 
> To minimize the down time, another possible way is to setup a secondary 0.96 
> cluster and then setup replication between the existing 0.94 cluster and the 
> new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch 
> the traffic to the 0.96 cluster and decommission the old one.
> The ideal steps will be:
> 1) Setup a 0.96 cluster
> 2) Setup replication between a running 0.94 cluster to the newly created 0.96 
> cluster
> 3) Wait till they're in sync in replication
> 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop 
> relocation now)
> 5) Forward read traffic to the slave 0.96 cluster
> 6) After a certain period, stop writes to the original 0.94 cluster if 
> everything is good and completes upgrade
> To get us there, there are two tasks:
> 1) Enable replication from 0.94 -> 0.96
> I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it 
> seems the best approach is to build a very similar service or on top of 
> https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support 
> three commands replicateLogEntries, multi and delete. Inside the three 
> commands, we just pass down the corresponding requests to the destination 
> 0.96 cluster as a bridge. The reason to support the multi and delete is for 
> CopyTable to copy data from a 0.94 cluster to a 0.96 one.
> The other approach is to provide limited support of 0.94 RPC protocol in 
> 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper 
> firstly before it can connect to a 0.96 region server. Therefore, we need a 
> faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to 
> connect. It may also pollute 0.96 code base with 0.94 RPC code.
> 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need 
> to load both hbase clients into one single JVM using different class loader.
> Let me know if you think this is worth to do and any better approach we could 
> take.
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9360) Enable 0.94 -> 0.96 replication to minimize upgrade down time

2013-09-09 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762255#comment-13762255
 ] 

Jean-Daniel Cryans commented on HBASE-9360:
---

When we talked about this, I didn't fully understand that the use case you had 
in mind was to enable people to upgrade with minimal downtime. I'm personally 
more interested in supporting simple master-slave replication where the master 
is the one that stays on 0.94 the longest, which is something people using 
replication will be facing.

Right now, upgrading a master-slave DR setup to 0.96 will involve either 
upgrading everything to 0.96 at the same time or pausing replication while the 
master isn't upgraded. [~v.himanshu] was saying that he tested the latter 
(although not explicitly) and it works, but still you aren't replicating for X 
amount of time.

It would be nice to have a way to keep replicating.

> Enable 0.94 -> 0.96 replication to minimize upgrade down time
> -
>
> Key: HBASE-9360
> URL: https://issues.apache.org/jira/browse/HBASE-9360
> Project: HBase
>  Issue Type: Brainstorming
>  Components: migration
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Jeffrey Zhong
>
> As we know 0.96 is a singularity release, as of today a 0.94 hbase user has 
> to do in-place upgrade: make corresponding client changes, recompile client 
> application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 
> binary, run upgrade script and then start the upgraded cluster. You can image 
> the down time will be extended if something is wrong in between. 
> To minimize the down time, another possible way is to setup a secondary 0.96 
> cluster and then setup replication between the existing 0.94 cluster and the 
> new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch 
> the traffic to the 0.96 cluster and decommission the old one.
> The ideal steps will be:
> 1) Setup a 0.96 cluster
> 2) Setup replication between a running 0.94 cluster to the newly created 0.96 
> cluster
> 3) Wait till they're in sync in replication
> 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop 
> relocation now)
> 5) Forward read traffic to the slave 0.96 cluster
> 6) After a certain period, stop writes to the original 0.94 cluster if 
> everything is good and completes upgrade
> To get us there, there are two tasks:
> 1) Enable replication from 0.94 -> 0.96
> I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it 
> seems the best approach is to build a very similar service or on top of 
> https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support 
> three commands replicateLogEntries, multi and delete. Inside the three 
> commands, we just pass down the corresponding requests to the destination 
> 0.96 cluster as a bridge. The reason to support the multi and delete is for 
> CopyTable to copy data from a 0.94 cluster to a 0.96 one.
> The other approach is to provide limited support of 0.94 RPC protocol in 
> 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper 
> firstly before it can connect to a 0.96 region server. Therefore, we need a 
> faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to 
> connect. It may also pollute 0.96 code base with 0.94 RPC code.
> 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need 
> to load both hbase clients into one single JVM using different class loader.
> Let me know if you think this is worth to do and any better approach we could 
> take.
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9360) Enable 0.94 -> 0.96 replication to minimize upgrade down time

2013-08-28 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752600#comment-13752600
 ] 

Jeffrey Zhong commented on HBASE-9360:
--

Thanks a lot for the inputs on this. I haven't tried loading both versions of 
Hbase client in one JVM. Without #2, we still can use replication to minimize 
the upgrade down time to seconds.

{code}
Has anyone asked for it?
{code}
So far no one is asking for this. With 0.94->0.96 replication support, we can 
ease upgrade pain and encourage people to test 0.96 against their shadow 
clusters of 0.94 production env.


> Enable 0.94 -> 0.96 replication to minimize upgrade down time
> -
>
> Key: HBASE-9360
> URL: https://issues.apache.org/jira/browse/HBASE-9360
> Project: HBase
>  Issue Type: Brainstorming
>  Components: migration
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Jeffrey Zhong
>
> As we know 0.96 is a singularity release, as of today a 0.94 hbase user has 
> to do in-place upgrade: make corresponding client changes, recompile client 
> application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 
> binary, run upgrade script and then start the upgraded cluster. You can image 
> the down time will be extended if something is wrong in between. 
> To minimize the down time, another possible way is to setup a secondary 0.96 
> cluster and then setup replication between the existing 0.94 cluster and the 
> new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch 
> the traffic to the 0.96 cluster and decommission the old one.
> The ideal steps will be:
> 1) Setup a 0.96 cluster
> 2) Setup replication between a running 0.94 cluster to the newly created 0.96 
> cluster
> 3) Wait till they're in sync in replication
> 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop 
> relocation now)
> 5) Forward read traffic to the slave 0.96 cluster
> 6) After a certain period, stop writes to the original 0.94 cluster if 
> everything is good and completes upgrade
> To get us there, there are two tasks:
> 1) Enable replication from 0.94 -> 0.96
> I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it 
> seems the best approach is to build a very similar service or on top of 
> https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support 
> three commands replicateLogEntries, multi and delete. Inside the three 
> commands, we just pass down the corresponding requests to the destination 
> 0.96 cluster as a bridge. The reason to support the multi and delete is for 
> CopyTable to copy data from a 0.94 cluster to a 0.96 one.
> The other approach is to provide limited support of 0.94 RPC protocol in 
> 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper 
> firstly before it can connect to a 0.96 region server. Therefore, we need a 
> faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to 
> connect. It may also pollute 0.96 code base with 0.94 RPC code.
> 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need 
> to load both hbase clients into one single JVM using different class loader.
> Let me know if you think this is worth to do and any better approach we could 
> take.
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9360) Enable 0.94 -> 0.96 replication to minimize upgrade down time

2013-08-27 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752130#comment-13752130
 ] 

Lars Hofhansl commented on HBASE-9360:
--

#2 does not work, it's something we've been thinking about to do a 0.96 upgrade 
eventually. So far we've punted this to "some OSGi magic" that will let us load 
both clients into the same VM.


> Enable 0.94 -> 0.96 replication to minimize upgrade down time
> -
>
> Key: HBASE-9360
> URL: https://issues.apache.org/jira/browse/HBASE-9360
> Project: HBase
>  Issue Type: Brainstorming
>  Components: migration
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Jeffrey Zhong
>
> As we know 0.96 is a singularity release, as of today a 0.94 hbase user has 
> to do in-place upgrade: make corresponding client changes, recompile client 
> application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 
> binary, run upgrade script and then start the upgraded cluster. You can image 
> the down time will be extended if something is wrong in between. 
> To minimize the down time, another possible way is to setup a secondary 0.96 
> cluster and then setup replication between the existing 0.94 cluster and the 
> new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch 
> the traffic to the 0.96 cluster and decommission the old one.
> The ideal steps will be:
> 1) Setup a 0.96 cluster
> 2) Setup replication between a running 0.94 cluster to the newly created 0.96 
> cluster
> 3) Wait till they're in sync in replication
> 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop 
> relocation now)
> 5) Forward read traffic to the slave 0.96 cluster
> 6) After a certain period, stop writes to the original 0.94 cluster if 
> everything is good and completes upgrade
> To get us there, there are two tasks:
> 1) Enable replication from 0.94 -> 0.96
> I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it 
> seems the best approach is to build a very similar service or on top of 
> https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support 
> three commands replicateLogEntries, multi and delete. Inside the three 
> commands, we just pass down the corresponding requests to the destination 
> 0.96 cluster as a bridge. The reason to support the multi and delete is for 
> CopyTable to copy data from a 0.94 cluster to a 0.96 one.
> The other approach is to provide limited support of 0.94 RPC protocol in 
> 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper 
> firstly before it can connect to a 0.96 region server. Therefore, we need a 
> faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to 
> connect. It may also pollute 0.96 code base with 0.94 RPC code.
> 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need 
> to load both hbase clients into one single JVM using different class loader.
> Let me know if you think this is worth to do and any better approach we could 
> take.
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9360) Enable 0.94 -> 0.96 replication to minimize upgrade down time

2013-08-27 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752123#comment-13752123
 ] 

stack commented on HBASE-9360:
--

On 1., a bridge would be cool.  Has anyone asked for it? (No to polluting 0.96 
w/ 0.94 rpc code -- at least at first blush)

On 2., have you tried it?

> Enable 0.94 -> 0.96 replication to minimize upgrade down time
> -
>
> Key: HBASE-9360
> URL: https://issues.apache.org/jira/browse/HBASE-9360
> Project: HBase
>  Issue Type: Brainstorming
>  Components: migration
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Jeffrey Zhong
>
> As we know 0.96 is a singularity release, as of today a 0.94 hbase user has 
> to do in-place upgrade: make corresponding client changes, recompile client 
> application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 
> binary, run upgrade script and then start the upgraded cluster. You can image 
> the down time will be extended if something is wrong in between. 
> To minimize the down time, another possible way is to setup a secondary 0.96 
> cluster and then setup replication between the existing 0.94 cluster and the 
> new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch 
> the traffic to the 0.96 cluster and decommission the old one.
> The ideal steps will be:
> 1) Setup a 0.96 cluster
> 2) Setup replication between a running 0.94 cluster to the newly created 0.96 
> cluster
> 3) Wait till they're in sync in replication
> 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop 
> relocation now)
> 5) Forward read traffic to the slave 0.96 cluster
> 6) After a certain period, stop writes to the original 0.94 cluster if 
> everything is good and completes upgrade
> To get us there, there are two tasks:
> 1) Enable replication from 0.94 -> 0.96
> I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it 
> seems the best approach is to build a very similar service or on top of 
> https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support 
> three commands replicateLogEntries, multi and delete. Inside the three 
> commands, we just pass down the corresponding requests to the destination 
> 0.96 cluster as a bridge. The reason to support the multi and delete is for 
> CopyTable to copy data from a 0.94 cluster to a 0.96 one.
> The other approach is to provide limited support of 0.94 RPC protocol in 
> 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper 
> firstly before it can connect to a 0.96 region server. Therefore, we need a 
> faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to 
> connect. It may also pollute 0.96 code base with 0.94 RPC code.
> 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need 
> to load both hbase clients into one single JVM using different class loader.
> Let me know if you think this is worth to do and any better approach we could 
> take.
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira