Re: How to cold (re)boot a cluster with already existing node data

Jan-Philip Loos Mon, 06 Jun 2016 10:39:09 -0700

On Mon, 6 Jun 2016 at 16:52 Alex Moore <amo...@basho.com> wrote:

> Hi Jan,
>
> When you update the Kubernates nodes, do you have to do them all at once
> or can they be done in a rolling fashion (one after another)?
>


Thnaks for your reply,

sadly this is not possible. Kubernetes with GKE just tears all nodes down,
creating new nodes with new kubernets version and reschedule all services
on these nodes. So after an upgrade, all riak nodes are stand-alone (when
starting after deleting /var/lib/riak/ring)

Greetings

Jan


> If you can do them rolling-wise, you should be able to:
>
> For each node, one at a time:
> 1. Shut down Riak
> 2. Shutdown/restart/upgrade Kubernates
> 3. Start Riak
> 4. Use `riak-admin force-replace` to rename the old node name to the new
> node name
> 5. Repeat on remaining nodes.
>
> This is covered in "Renaming Multi-node clusters
> <http://docs.basho.com/riak/kv/2.1.4/using/cluster-operations/changing-cluster-info/#rename-multi-node-clusters>"
> doc.
>
> As for your current predicament,  have you created any new buckets/changed
> bucket props in the default namespace since you restarted? Or have you only
> done regular operations since?
>
> Thanks,
> Alex
>
>
> On Mon, Jun 6, 2016 at 5:25 AM Jan-Philip Loos <maxda...@gmail.com> wrote:
>
>> Hi,
>>
>> we are using riak in a kuberentes cluster (on GKE). Sometimes it's
>> necessary to reboot the complete cluster to update the kubernetes-nodes.
>> This results in a complete shutdown of the riak cluster and the riak-nodes
>> are rescheduled with a new IP. So how can I handle this situation? How can
>> I form a new riak cluster out of the old nodes with new names?
>>
>> The /var/lib/riak directory is persisted. I had to delete the
>> /var/lib/riak/ring folder otherwise "riak start" crashed with this message
>> (but saved the old ring state in a tar):
>>
>> {"Kernel pid
>>> terminated",application_controller,"{application_start_failure,riak_core,{{shutdown,{failed_to_start_child,riak_core_broadcast,{'EXIT',{function_clause,[{orddict,fetch,['
>>> riak@10.44.2.8
>>> ',[]],[{file,\"orddict.erl\"},{line,72}]},{riak_core_broadcast,init_peers,1,[{file,\"src/riak_core_broadcast.erl\"},{line,616}]},{riak_core_broadcast,start_link,0,[{file,\"src/riak_core_broadcast.erl\"},{line,116}]},{supervisor,do_start_child,2,[{file,\"supervisor.erl\"},{line,310}]},{supervisor,start_children,3,[{file,\"supervisor.erl\"},{line,293}]},{supervisor,init_children,2,[{file,\"supervisor.erl\"},{line,259}]},{gen_server,init_it,6,[{file,\"gen_server.erl\"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,239}]}]}}}},{riak_core_app,start,[normal,[]]}}}"}
>>> Crash dump was written to: /var/log/riak/erl_crash.dump
>>> Kernel pid terminated (application_controller)
>>> ({application_start_failure,riak_core,{{shutdown,{failed_to_start_child,riak_core_broadcast,{'EXIT',{function_clause,[{orddict,fetch,['
>>> riak@10.44.2.8',
>>
>>
>> The I formed a new cluster via join & plan & commit.
>>
>> But now, I discovered a problems with incomplete and inconsistent
>> partitions:
>>
>> *$ *curl -Ss "
>> http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true";
>> | jq '.[] | length'
>>
>> 3064
>>
>> *$* curl -Ss "
>> http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true";
>> | jq '.[] | length'
>>
>> 2987
>>
>> *$* curl -Ss "
>> http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true";
>> | jq '.[] | length'
>>
>> 705
>>
>> *$* curl -Ss "
>> http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true";
>> | jq '.[] | length'
>> 3064
>>
>> Is there a way to fix this? I guess this is caused by the missing old
>> ring-state?
>>
>> Greetings
>>
>> Jan
>>
> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: How to cold (re)boot a cluster with already existing node data

Reply via email to