Re: 1.4.2: 'riak-admin reip' no longer works?

Rune Skou Larsen Tue, 22 Oct 2013 04:10:45 -0700

Den 22-10-2013 09:53, Shane McEwan skrev:

On 21/10/13 17:57, Joe Caswell wrote:

The only use case left for reip is when you have simultaneously changed the
node name for every node in the cluster, such as when loading an entire
cluster's worth of backups to new machines.

When I need to do this I just create a new, empty cluster with the new
names. Then shut down the cluster and restore only the data directories
(leveldb, for example) from the backup, leaving the ring directory
alone. Then I start up the cluster and it finds the restored data.

Thanks for the tip. Beware that this will not restore bucket props,because they are stored in the ring dir and not the data dir.

You need to be careful about restoring the old node's data to the
corresponding new node otherwise you'll get hinted handoffs flying
between all your nodes but after a bit of trial and error you can figure
out which node is which.

When you create the new, empty cluster, Riak distributes the partitionsbetween the nodes using the claim function. I believe claim_v2(riak_core_claim.erl) is still the default claim function and it willproduce different partition distributions in different runs when joiningnodes to form a cluster. Including the dreaded 12,12,12,12,16 on adefault config with 5 nodes.I guess sometimes you'll be lucky, and the new shiny cluster will havethe same partition distribution as the backup, but in my experience,this is not always the case, which means the new cluster will need tohandoff data between nodes, to match the data with the ring-files'partition distribution.

The good(tm) way to restore a complete backup to a new environment is torestore data and partition distribution together - i.e. both the datadirs and the ring files.For this purpose, reip was very useful in v 1.3.x, where the node it wascalled on, did not have to be running. Unfortunately in 1.4.2, thereip'ed node must be running (which sort of defies the purpose of reip):


"riak-1.3.2/rel/riak/bin> ./riak-admin reip bla bla2

Backed up existing ring file to"./data/ring/riak_core_ring.default.20131004091923.BAK"

New ring file written to "./data/ring/riak_core_ring.default.20131022105325

riak-1.4.2/rel/riak/bin> ./riak-admin reip bla bla2
Node is not running!"

A common case where you need reip'ing non-running nodes, is when youcopy production data to a staging environment, and need to ensure thatyour new staging cluster doesn't reference production nodes beforefiring it up. Does anyone know a good solution to this in 1.4? The onlytwo ways I see are: 1) Edit the ring-files by hand 2) Restore to a newcluster with potentially mismatched partition distribution between dataand ring-files, and wait for handoffs to complete.


- Rune, Trifork

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: 1.4.2: 'riak-admin reip' no longer works?

Reply via email to