Re: Reip(ing) riak node created two copies in the cluster

Nitish Sharma Wed, 02 May 2012 10:08:15 -0700

On May 2, 2012, at 6:12 PM, Jon Meredith wrote:

> Hi Nitish, for this to work you'll have to stop all the nodes at the same 
> time, clear the ring on all nodes, start up all nodes, then rejoin
> 
> If you clear the rings one node at a time, when you rejoin the nodes the ring 
> with the old and new style names will be gossipped back to it and you'll 
> still have both names.
Sorry for the confusion. I didn't clear the rings one node at a time while 
keeping other nodes live. Following are the steps I followed:
1. Stop Riak on all the nodes.
2. Remove ring directory from all nodes.
3. Start the nodes and rejoin.


> I didn't realize you had a large amount of data - originally you said 
> "Currently, we are hosting limited amount of data", but 200mil docs per node 
> seems like a fair amount.  Rebuilding that size cluster may take a long time.
> 
Yeah, we are currently serving very limited amount because of Riak shortage. In 
total, we have almost 750 million documents served by Riak.
> Your options as I see them are
>   1) If you have backups of the ring files, you could revert the node name 
> changes and get the cluster stable again on riak@IP.  The ring files have a 
> timestamp associated with them, but we only keep a few of the last ring 
> files, so if enough gossip has happened then the pre-rename rings will have 
> been destroyed.  You will have to stop all nodes, put the ring files back as 
> they were before the change and fix the names in vm.args and then restart the 
> nodes.
> 
>   2) you can continue on the rebuild plan.  stop all nodes, set the new names 
> in vm.args, start the nodes again and rebuild the cluster, adding as many 
> nodes as you can at once so they rebalance at the same time.  When new nodes 
> are added the claimant node works out ownership changes and will start a 
> sequence of transfers.  If new nodes are added once a sequence is under way 
> the claimant will wait for that to complete, then check if there are any new 
> nodes and repeat until all nodes are assigned.  If you add all the nodes at 
> once you will do less transfers over all.
> 
> 
> If the cluster cannot be stopped, there are other things we might be able to 
> do, but they're a bit more complex.  What are your uptime requirements?
> 
We have currently stopped the cluster and running on small amount of data. We 
can wait for the partition re-distribution to complete on Riak, but I don't 
have a strong feeling about it. "member_status" doesn't give us a correct 
picture: http://pastie.org/3849548. Is this expected behavior? I should also 
mention that all the nodes are still loading existing data and it will take few 
hours (2-3) until Riak KV is running on all of them.

Cheers
Nitish
> Jon
> 
> 
> 
> On Wed, May 2, 2012 at 9:57 AM, Nitish Sharma <[email protected]> 
> wrote:
> Hi Jon,
> Thanks for your input. I've already started working on that lines. 
> I stopped all the nodes, moved ring directory from one node, brought that one 
> up, and issued join command to one other node (after moving the ring 
> directory) - node2. While they were busy re-distributing the partitions, I 
> started another node (node3) and issued join command (before risk_kv was 
> running, since it takes some time to load existing data).
> But after this, data handoffs are occurring only between node1 and node2. 
> "member_status" says that node 3 owns 0% of the ring and 0% are pending.
> We have a lot of data - each node serves around 200 million documents. Riak 
> cluster is running 1.1.2.
> Any suggestions?
> 
> Cheers
> Nitish
> On May 2, 2012, at 5:31 PM, Jon Meredith wrote:
> 
>> Hi Nitish,
>> 
>> If you rebuild the cluster with the same ring size, the data will eventually 
>> get back to the right place.  While the rebuild is taking place you may have 
>> notfounds for gets until the data has been handed off to the newly assigned 
>> owner (as it will be secondary handoff, not primary ownership handoff to get 
>> teh data back).  If you don't have a lot of data stored in the cluster it 
>> shouldn't take too long.
>> 
>> The process would be to stop all nodes, move the files out of the ring 
>> directory to a safe place, start all nodes and rejoin.  If you're using 
>> 1.1.x and you have capacity in your hardware you may want to increase 
>> handoff_concurrency to something like 4 to permit more transfers to happen 
>> across the cluster.
>> 
>> 
>> Jon.
>> 
>> 
>> 
>> On Wed, May 2, 2012 at 9:05 AM, Nitish Sharma <[email protected]> 
>> wrote:
>> Hi,
>> We have a 12-node Riak cluster. Until now we were naming every new node as 
>> riak@<ip_address>. We then decided to rename the all the nodes to 
>> riak@<hostname>, which makes troubleshooting easier.
>> After issuing reip command to two nodes, we noticed in the "status" that 
>> those 2 nodes were now appearing in the cluster with the old name as well as 
>> the new name. Other nodes were trying to handoff partitions to the "new" 
>> nodes, but apparently they were not able to. After this the whole cluster 
>> went down and completely stopped responding to any read/write requests.
>> member_status displayed old Riak name in "legacy" mode. Since this is our 
>> production cluster, we are desperately looking for some quick remedies. 
>> Issuing "force-remove" to the old names, restarting all the nodes, changing 
>> the riak names back to the old ones -  none of it helped.
>> Currently, we are hosting limited amount of data. Whats an elegant way to 
>> recover from this mess? Would shutting off all the nodes, deleting the ring 
>> directory, and again forming the cluster work?
>> 
>> Cheers
>> Nitish
>> _______________________________________________
>> riak-users mailing list
>> [email protected]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 
>> 
>> -- 
>> Jon Meredith
>> Platform Engineering Manager
>> Basho Technologies, Inc.
>> [email protected]
>> 
> 
> 
> 
> 
> -- 
> Jon Meredith
> Platform Engineering Manager
> Basho Technologies, Inc.
> [email protected]
>

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Reip(ing) riak node created two copies in the cluster

Reply via email to