Re: Rolling restart of a cluster?

Nikolas Everett Wed, 02 Apr 2014 11:17:46 -0700

I'm not sure what is up but my advice is to make sure you read the cluster
state from the node you are restarting.  That'll make sure it is up in the
first place and you'll get that node's view of the cluster.



Nik


On Wed, Apr 2, 2014 at 2:08 PM, Mike Deeks <mik...@gmail.com> wrote:

> That is exactly what I'm doing. For some reason the cluster reports as
> green even though an entire node is down. The cluster doesn't seem to
> notice the node is gone and change to yellow until many seconds later. By
> then my rolling restart script has already gotten to the second node and
> killed it because the cluster was still green for some reason.
>
>
> On Wednesday, April 2, 2014 4:23:32 AM UTC-7, Petter Abrahamsson wrote:
>
>> Mike,
>>
>> Your script needs to check for the status of the cluster before shutting
>> down a node, ie if the state is yellow wait until it becomes green again
>> before shutting down the next node. You'll probably want do disable
>> allocation of shards while each node is being restarted (enable when node
>> comes back) in order to minimize the amount of data that needs to be
>> rebalanced.
>> Also make sure to have 'discovery.zen.minimum_master_nodes' correctly
>> set in your elasticsearch.yml file.
>>
>> Meta code
>>
>> for node in $cluster_nodes; do
>>    if [ $cluster_status == 'green' ]; then
>>     cluster_disable_allocation()
>>     shutdown_node($node)
>>     wait_for_node_to_rejoin()
>>     cluster_enable_allocation()
>>     wait_for_cluster_status_green()
>>   fi
>> done
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/
>> reference/current/modules-cluster.html
>>
>> /petter
>>
>>
>> On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks <mik...@gmail.com> wrote:
>>
>>> What is the proper way of performing a rolling restart of a cluster? I
>>> currently have my stop script check for the cluster health to be green
>>> before stopping itself. Unfortunately this doesn't appear to be working.
>>>
>>> My setup:
>>> ES 1.0.0
>>> 3 node cluster w/ 1 replica.
>>>
>>> When I perform the rolling restart I see the cluster still reporting a
>>> green state when a node is down. In theory that should be a yellow state
>>> since some shards will be unallocated. My script output during a rolling
>>> restart:
>>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>>>
>>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>>>
>>> curl: (52) Empty reply from server
>>> 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
>>>
>>> curl: (52) Empty reply from server
>>> 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
>>> ... continues as green for many more seconds...
>>>
>>> Since it is reporting as green, the second node thinks it can stop and
>>> ends up putting the cluster into a broken red state:
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046
>>>
>>> My stop script issues a call to http://localhost:9200/_
>>> cluster/nodes/_local/_shutdown to kill the node. Is it possible the
>>> other nodes are waiting to timeout the down node before moving into the
>>> yellow state? I would assume the shutdown API call would inform the other
>>> nodes that it is going down.
>>>
>>> Appreciate any help on how to do this properly.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%
>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/07944665-ce89-4b12-94c2-69e815a4c15f%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/07944665-ce89-4b12-94c2-69e815a4c15f%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3UX25CQGtv_1En4NihgZo5gd04jj9fvPuAuMhC7HU%2B0A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Rolling restart of a cluster?

Reply via email to