Hi Jon, Thanks for your input. I've already started working on that lines. I stopped all the nodes, moved ring directory from one node, brought that one up, and issued join command to one other node (after moving the ring directory) - node2. While they were busy re-distributing the partitions, I started another node (node3) and issued join command (before risk_kv was running, since it takes some time to load existing data). But after this, data handoffs are occurring only between node1 and node2. "member_status" says that node 3 owns 0% of the ring and 0% are pending. We have a lot of data - each node serves around 200 million documents. Riak cluster is running 1.1.2. Any suggestions?
Cheers Nitish On May 2, 2012, at 5:31 PM, Jon Meredith wrote: > Hi Nitish, > > If you rebuild the cluster with the same ring size, the data will eventually > get back to the right place. While the rebuild is taking place you may have > notfounds for gets until the data has been handed off to the newly assigned > owner (as it will be secondary handoff, not primary ownership handoff to get > teh data back). If you don't have a lot of data stored in the cluster it > shouldn't take too long. > > The process would be to stop all nodes, move the files out of the ring > directory to a safe place, start all nodes and rejoin. If you're using 1.1.x > and you have capacity in your hardware you may want to increase > handoff_concurrency to something like 4 to permit more transfers to happen > across the cluster. > > > Jon. > > > > On Wed, May 2, 2012 at 9:05 AM, Nitish Sharma <[email protected]> > wrote: > Hi, > We have a 12-node Riak cluster. Until now we were naming every new node as > riak@<ip_address>. We then decided to rename the all the nodes to > riak@<hostname>, which makes troubleshooting easier. > After issuing reip command to two nodes, we noticed in the "status" that > those 2 nodes were now appearing in the cluster with the old name as well as > the new name. Other nodes were trying to handoff partitions to the "new" > nodes, but apparently they were not able to. After this the whole cluster > went down and completely stopped responding to any read/write requests. > member_status displayed old Riak name in "legacy" mode. Since this is our > production cluster, we are desperately looking for some quick remedies. > Issuing "force-remove" to the old names, restarting all the nodes, changing > the riak names back to the old ones - none of it helped. > Currently, we are hosting limited amount of data. Whats an elegant way to > recover from this mess? Would shutting off all the nodes, deleting the ring > directory, and again forming the cluster work? > > Cheers > Nitish > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > -- > Jon Meredith > Platform Engineering Manager > Basho Technologies, Inc. > [email protected] >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
