Re: Rolling restart
You have to reenable allocation after the node comes back and wait for the shards to initialize there. On Fri, Dec 19, 2014 at 3:23 PM, iskren.cher...@gmail.com wrote: I'm maintaining a small cluster of 9 nodes, and was trying to perform rolling restart as outlined here: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_rolling_restarts.html#_rolling_restarts The problem is that after I disable reallocation and restart a single node, it appears it looses all its shards indefinitely (until I turn back reallocation). So if I do this for all nodes in the cluster I'll run out of primary shards at some point. I have an upstart task for Elasticsearch, so I stopped nodes with that (it sends SIGTERM). I tried the shutdown API but it did have the same effect -- after node joins the cluster, it doesn't own any shards, and that doesn't change if I wait for a while. Am I doing something wrong? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c9eef95e-d7cf-4278-a99f-89d9ab878791%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/c9eef95e-d7cf-4278-a99f-89d9ab878791%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2zKhhiDpwH%2BWj4SoQYP5B6C5seET%2BBtTYCDM%2B-3rS%3D0A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Rolling restart
On Friday, December 19, 2014 12:31:33 PM UTC-8, Nikolas Everett wrote: You have to reenable allocation after the node comes back and wait for the shards to initialize there. So this means the tutorial is wrong (current version): 2. Disable allocation 3. stop node 4. ... 5. start node 6. Repeat 3-5 for the rest of your nodes 7. Re-enable shard allocation using ... It should be: 2. disable allocation 3. stop node 4. ... 5. start node 6. enable allocation 7. repeat steps 2-6 for the rest of your nodes -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/066a7a52-c011-44c3-b630-953c82ab4818%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Rolling restart
I believe so. On Fri, Dec 19, 2014 at 3:39 PM, iskren.cher...@gmail.com wrote: On Friday, December 19, 2014 12:31:33 PM UTC-8, Nikolas Everett wrote: You have to reenable allocation after the node comes back and wait for the shards to initialize there. So this means the tutorial is wrong (current version): 2. Disable allocation 3. stop node 4. ... 5. start node 6. Repeat 3-5 for the rest of your nodes 7. Re-enable shard allocation using ... It should be: 2. disable allocation 3. stop node 4. ... 5. start node 6. enable allocation 7. repeat steps 2-6 for the rest of your nodes -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/066a7a52-c011-44c3-b630-953c82ab4818%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/066a7a52-c011-44c3-b630-953c82ab4818%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1UpZHXYuTWPtJOjXFKE7wPfwQe4puaT30yM32KfDHESw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Rolling restart of a cluster?
Mike, Your script needs to check for the status of the cluster before shutting down a node, ie if the state is yellow wait until it becomes green again before shutting down the next node. You'll probably want do disable allocation of shards while each node is being restarted (enable when node comes back) in order to minimize the amount of data that needs to be rebalanced. Also make sure to have 'discovery.zen.minimum_master_nodes' correctly set in your elasticsearch.yml file. Meta code for node in $cluster_nodes; do if [ $cluster_status == 'green' ]; then cluster_disable_allocation() shutdown_node($node) wait_for_node_to_rejoin() cluster_enable_allocation() wait_for_cluster_status_green() fi done http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html /petter On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks mik...@gmail.com wrote: What is the proper way of performing a rolling restart of a cluster? I currently have my stop script check for the cluster health to be green before stopping itself. Unfortunately this doesn't appear to be working. My setup: ES 1.0.0 3 node cluster w/ 1 replica. When I perform the rolling restart I see the cluster still reporting a green state when a node is down. In theory that should be a yellow state since some shards will be unallocated. My script output during a rolling restart: 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0 curl: (52) Empty reply from server 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0 curl: (52) Empty reply from server 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0 ... continues as green for many more seconds... Since it is reporting as green, the second node thinks it can stop and ends up putting the cluster into a broken red state: curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046 My stop script issues a call to http://localhost:9200/_cluster/nodes/_local/_shutdown to kill the node. Is it possible the other nodes are waiting to timeout the down node before moving into the yellow state? I would assume the shutdown API call would inform the other nodes that it is going down. Appreciate any help on how to do this properly. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALAhT_hertv4oX1Rcq71ELQUBdyq33ncktqT5%3DZn%3D0cOfkBxaA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Rolling restart of a cluster?
I just used this to upgrade our labs environment a couple of days ago: #!/bin/bash export prefix=deployment-elastic0 export suffix=.eqiad.wmflabs rm -f servers for i in {1..4}; do echo $prefix$i$suffix servers done cat __commands__ /tmp/commands wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.1.0.deb sudo dpkg -i --force-confdef --force-confold elasticsearch-1.1.0.deb curl -s -XPUT localhost:9200/_cluster/settings?pretty -d '{ transient : { cluster.routing.allocation.enable: primaries } }' sudo /etc/init.d/elasticsearch restart until curl -s localhost:9200/_cluster/health?pretty; do sleep 1 done curl -s -XPUT localhost:9200/_cluster/settings?pretty -d '{ transient : { cluster.routing.allocation.enable: all } }' until curl -s localhost:9200/_cluster/health?pretty | tee /tmp/health | grep green; do cat /tmp/health sleep 1 done __commands__ for server in $(cat servers); do scp /tmp/commands $server:/tmp/commands ssh $server bash /tmp/commands done Production will swap wget and dpkg with apt-get update and apt-get install elasticsearch but you get the idea. It isn't fool proof. If it dies it doesn't know how to start where it left off and you might have to kill it if the cluster doesn't come back like you'd expect. It really only covers the everything worked out as expected scenario. But it is nice when that happens. Nik On Wed, Apr 2, 2014 at 7:23 AM, Petter Abrahamsson pet...@jebus.nu wrote: Mike, Your script needs to check for the status of the cluster before shutting down a node, ie if the state is yellow wait until it becomes green again before shutting down the next node. You'll probably want do disable allocation of shards while each node is being restarted (enable when node comes back) in order to minimize the amount of data that needs to be rebalanced. Also make sure to have 'discovery.zen.minimum_master_nodes' correctly set in your elasticsearch.yml file. Meta code for node in $cluster_nodes; do if [ $cluster_status == 'green' ]; then cluster_disable_allocation() shutdown_node($node) wait_for_node_to_rejoin() cluster_enable_allocation() wait_for_cluster_status_green() fi done http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html /petter On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks mik...@gmail.com wrote: What is the proper way of performing a rolling restart of a cluster? I currently have my stop script check for the cluster health to be green before stopping itself. Unfortunately this doesn't appear to be working. My setup: ES 1.0.0 3 node cluster w/ 1 replica. When I perform the rolling restart I see the cluster still reporting a green state when a node is down. In theory that should be a yellow state since some shards will be unallocated. My script output during a rolling restart: 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0 curl: (52) Empty reply from server 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0 curl: (52) Empty reply from server 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0 ... continues as green for many more seconds... Since it is reporting as green, the second node thinks it can stop and ends up putting the cluster into a broken red state: curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046 My stop script issues a call to http://localhost:9200/_cluster/nodes/_local/_shutdown to kill the node. Is it possible the other nodes are waiting to timeout the down node before moving into the yellow state? I would assume the shutdown API call would inform the other nodes that it
Re: Rolling restart of a cluster?
That is exactly what I'm doing. For some reason the cluster reports as green even though an entire node is down. The cluster doesn't seem to notice the node is gone and change to yellow until many seconds later. By then my rolling restart script has already gotten to the second node and killed it because the cluster was still green for some reason. On Wednesday, April 2, 2014 4:23:32 AM UTC-7, Petter Abrahamsson wrote: Mike, Your script needs to check for the status of the cluster before shutting down a node, ie if the state is yellow wait until it becomes green again before shutting down the next node. You'll probably want do disable allocation of shards while each node is being restarted (enable when node comes back) in order to minimize the amount of data that needs to be rebalanced. Also make sure to have 'discovery.zen.minimum_master_nodes' correctly set in your elasticsearch.yml file. Meta code for node in $cluster_nodes; do if [ $cluster_status == 'green' ]; then cluster_disable_allocation() shutdown_node($node) wait_for_node_to_rejoin() cluster_enable_allocation() wait_for_cluster_status_green() fi done http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html /petter On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks mik...@gmail.com javascript: wrote: What is the proper way of performing a rolling restart of a cluster? I currently have my stop script check for the cluster health to be green before stopping itself. Unfortunately this doesn't appear to be working. My setup: ES 1.0.0 3 node cluster w/ 1 replica. When I perform the rolling restart I see the cluster still reporting a green state when a node is down. In theory that should be a yellow state since some shards will be unallocated. My script output during a rolling restart: 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0 curl: (52) Empty reply from server 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0 curl: (52) Empty reply from server 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0 ... continues as green for many more seconds... Since it is reporting as green, the second node thinks it can stop and ends up putting the cluster into a broken red state: curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046 My stop script issues a call to http://localhost:9200/_cluster/nodes/_local/_shutdown to kill the node. Is it possible the other nodes are waiting to timeout the down node before moving into the yellow state? I would assume the shutdown API call would inform the other nodes that it is going down. Appreciate any help on how to do this properly. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/07944665-ce89-4b12-94c2-69e815a4c15f%40googlegroups.com. For more options, visit
Re: Rolling restart of a cluster?
My scripts do a wait for yellow before waiting for green, because as you noticed, the cluster does not entering a yellow state immediately following a cluster (shutdown, replica change) event. -- Ivan On Wed, Apr 2, 2014 at 11:08 AM, Mike Deeks mik...@gmail.com wrote: That is exactly what I'm doing. For some reason the cluster reports as green even though an entire node is down. The cluster doesn't seem to notice the node is gone and change to yellow until many seconds later. By then my rolling restart script has already gotten to the second node and killed it because the cluster was still green for some reason. On Wednesday, April 2, 2014 4:23:32 AM UTC-7, Petter Abrahamsson wrote: Mike, Your script needs to check for the status of the cluster before shutting down a node, ie if the state is yellow wait until it becomes green again before shutting down the next node. You'll probably want do disable allocation of shards while each node is being restarted (enable when node comes back) in order to minimize the amount of data that needs to be rebalanced. Also make sure to have 'discovery.zen.minimum_master_nodes' correctly set in your elasticsearch.yml file. Meta code for node in $cluster_nodes; do if [ $cluster_status == 'green' ]; then cluster_disable_allocation() shutdown_node($node) wait_for_node_to_rejoin() cluster_enable_allocation() wait_for_cluster_status_green() fi done http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/modules-cluster.html /petter On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks mik...@gmail.com wrote: What is the proper way of performing a rolling restart of a cluster? I currently have my stop script check for the cluster health to be green before stopping itself. Unfortunately this doesn't appear to be working. My setup: ES 1.0.0 3 node cluster w/ 1 replica. When I perform the rolling restart I see the cluster still reporting a green state when a node is down. In theory that should be a yellow state since some shards will be unallocated. My script output during a rolling restart: 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0 curl: (52) Empty reply from server 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0 curl: (52) Empty reply from server 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0 ... continues as green for many more seconds... Since it is reporting as green, the second node thinks it can stop and ends up putting the cluster into a broken red state: curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046 My stop script issues a call to http://localhost:9200/_ cluster/nodes/_local/_shutdown to kill the node. Is it possible the other nodes are waiting to timeout the down node before moving into the yellow state? I would assume the shutdown API call would inform the other nodes that it is going down. Appreciate any help on how to do this properly. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889% 40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an
Re: Rolling restart of a cluster?
I'm not sure what is up but my advice is to make sure you read the cluster state from the node you are restarting. That'll make sure it is up in the first place and you'll get that node's view of the cluster. Nik On Wed, Apr 2, 2014 at 2:08 PM, Mike Deeks mik...@gmail.com wrote: That is exactly what I'm doing. For some reason the cluster reports as green even though an entire node is down. The cluster doesn't seem to notice the node is gone and change to yellow until many seconds later. By then my rolling restart script has already gotten to the second node and killed it because the cluster was still green for some reason. On Wednesday, April 2, 2014 4:23:32 AM UTC-7, Petter Abrahamsson wrote: Mike, Your script needs to check for the status of the cluster before shutting down a node, ie if the state is yellow wait until it becomes green again before shutting down the next node. You'll probably want do disable allocation of shards while each node is being restarted (enable when node comes back) in order to minimize the amount of data that needs to be rebalanced. Also make sure to have 'discovery.zen.minimum_master_nodes' correctly set in your elasticsearch.yml file. Meta code for node in $cluster_nodes; do if [ $cluster_status == 'green' ]; then cluster_disable_allocation() shutdown_node($node) wait_for_node_to_rejoin() cluster_enable_allocation() wait_for_cluster_status_green() fi done http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/modules-cluster.html /petter On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks mik...@gmail.com wrote: What is the proper way of performing a rolling restart of a cluster? I currently have my stop script check for the cluster health to be green before stopping itself. Unfortunately this doesn't appear to be working. My setup: ES 1.0.0 3 node cluster w/ 1 replica. When I perform the rolling restart I see the cluster still reporting a green state when a node is down. In theory that should be a yellow state since some shards will be unallocated. My script output during a rolling restart: 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0 curl: (52) Empty reply from server 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0 curl: (52) Empty reply from server 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0 ... continues as green for many more seconds... Since it is reporting as green, the second node thinks it can stop and ends up putting the cluster into a broken red state: curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046 My stop script issues a call to http://localhost:9200/_ cluster/nodes/_local/_shutdown to kill the node. Is it possible the other nodes are waiting to timeout the down node before moving into the yellow state? I would assume the shutdown API call would inform the other nodes that it is going down. Appreciate any help on how to do this properly. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889% 40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails