Re: Rolling restart

2014-12-19 Thread Nikolas Everett
You have to reenable allocation after the node comes back and wait for the
shards to initialize there.

On Fri, Dec 19, 2014 at 3:23 PM, iskren.cher...@gmail.com wrote:

 I'm maintaining a small cluster of 9 nodes, and was trying to perform
 rolling restart as outlined here:
 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_rolling_restarts.html#_rolling_restarts

 The problem is that after I disable reallocation and restart a single
 node, it appears it looses all its shards indefinitely (until I turn back
 reallocation). So if I do this for all nodes in the cluster I'll run out of
 primary shards at some point.

 I have an upstart task for Elasticsearch, so I stopped nodes with that (it
 sends SIGTERM). I tried the shutdown API but it did have the same effect --
 after node joins the cluster, it doesn't own any shards, and that doesn't
 change if I wait for a while.

 Am I doing something wrong?

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/c9eef95e-d7cf-4278-a99f-89d9ab878791%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/c9eef95e-d7cf-4278-a99f-89d9ab878791%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2zKhhiDpwH%2BWj4SoQYP5B6C5seET%2BBtTYCDM%2B-3rS%3D0A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Rolling restart

2014-12-19 Thread iskren . chernev


On Friday, December 19, 2014 12:31:33 PM UTC-8, Nikolas Everett wrote:

 You have to reenable allocation after the node comes back and wait for the 
 shards to initialize there.


So this means the tutorial is wrong (current version):

2. Disable allocation
3. stop node
4. ...
5. start node
6. Repeat 3-5 for the rest of your nodes
7. Re-enable shard allocation using ...

It should be:

2. disable allocation
3. stop node
4. ...
5. start node
6. enable allocation
7. repeat steps 2-6 for the rest of your nodes 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/066a7a52-c011-44c3-b630-953c82ab4818%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Rolling restart

2014-12-19 Thread Nikolas Everett
I believe so.

On Fri, Dec 19, 2014 at 3:39 PM, iskren.cher...@gmail.com wrote:



 On Friday, December 19, 2014 12:31:33 PM UTC-8, Nikolas Everett wrote:

 You have to reenable allocation after the node comes back and wait for
 the shards to initialize there.


 So this means the tutorial is wrong (current version):

 2. Disable allocation
 3. stop node
 4. ...
 5. start node
 6. Repeat 3-5 for the rest of your nodes
 7. Re-enable shard allocation using ...

 It should be:

 2. disable allocation
 3. stop node
 4. ...
 5. start node
 6. enable allocation
 7. repeat steps 2-6 for the rest of your nodes

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/066a7a52-c011-44c3-b630-953c82ab4818%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/066a7a52-c011-44c3-b630-953c82ab4818%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1UpZHXYuTWPtJOjXFKE7wPfwQe4puaT30yM32KfDHESw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Rolling restart of a cluster?

2014-04-02 Thread Petter Abrahamsson
Mike,

Your script needs to check for the status of the cluster before shutting
down a node, ie if the state is yellow wait until it becomes green again
before shutting down the next node. You'll probably want do disable
allocation of shards while each node is being restarted (enable when node
comes back) in order to minimize the amount of data that needs to be
rebalanced.
Also make sure to have 'discovery.zen.minimum_master_nodes' correctly set
in your elasticsearch.yml file.

Meta code

for node in $cluster_nodes; do
  if [ $cluster_status == 'green' ]; then
cluster_disable_allocation()
shutdown_node($node)
wait_for_node_to_rejoin()
cluster_enable_allocation()
wait_for_cluster_status_green()
  fi
done

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html

/petter


On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks mik...@gmail.com wrote:

 What is the proper way of performing a rolling restart of a cluster? I
 currently have my stop script check for the cluster health to be green
 before stopping itself. Unfortunately this doesn't appear to be working.

 My setup:
 ES 1.0.0
 3 node cluster w/ 1 replica.

 When I perform the rolling restart I see the cluster still reporting a
 green state when a node is down. In theory that should be a yellow state
 since some shards will be unallocated. My script output during a rolling
 restart:
 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0

 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0

 curl: (52) Empty reply from server
 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0

 curl: (52) Empty reply from server
 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
 ... continues as green for many more seconds...

 Since it is reporting as green, the second node thinks it can stop and
 ends up putting the cluster into a broken red state:
 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046

 My stop script issues a call to
 http://localhost:9200/_cluster/nodes/_local/_shutdown to kill the node.
 Is it possible the other nodes are waiting to timeout the down node before
 moving into the yellow state? I would assume the shutdown API call would
 inform the other nodes that it is going down.

 Appreciate any help on how to do this properly.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALAhT_hertv4oX1Rcq71ELQUBdyq33ncktqT5%3DZn%3D0cOfkBxaA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Rolling restart of a cluster?

2014-04-02 Thread Nikolas Everett
I just used this to upgrade our labs environment a couple of days ago:

#!/bin/bash

export prefix=deployment-elastic0
export suffix=.eqiad.wmflabs
rm -f servers
for i in {1..4}; do
echo $prefix$i$suffix  servers
done

cat  __commands__  /tmp/commands
wget
https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.1.0.deb
sudo dpkg -i --force-confdef --force-confold elasticsearch-1.1.0.deb
curl -s -XPUT localhost:9200/_cluster/settings?pretty -d '{
transient : {
cluster.routing.allocation.enable: primaries
}
}'
sudo /etc/init.d/elasticsearch restart
until curl -s localhost:9200/_cluster/health?pretty; do
sleep 1
done
curl -s -XPUT localhost:9200/_cluster/settings?pretty -d '{
transient : {
cluster.routing.allocation.enable: all
}
}'
until curl -s localhost:9200/_cluster/health?pretty | tee /tmp/health |
grep green; do
cat /tmp/health
sleep 1
done
__commands__

for server in $(cat servers); do
scp /tmp/commands $server:/tmp/commands
ssh $server bash /tmp/commands
done



Production will swap wget and dpkg with apt-get update and apt-get install
elasticsearch but you get the idea.

It isn't fool proof.  If it dies it doesn't know how to start where it left
off and you might have to kill it if the cluster doesn't come back like
you'd expect.  It really only covers the everything worked out as
expected scenario.  But it is nice when that happens.

Nik


On Wed, Apr 2, 2014 at 7:23 AM, Petter Abrahamsson pet...@jebus.nu wrote:

 Mike,

 Your script needs to check for the status of the cluster before shutting
 down a node, ie if the state is yellow wait until it becomes green again
 before shutting down the next node. You'll probably want do disable
 allocation of shards while each node is being restarted (enable when node
 comes back) in order to minimize the amount of data that needs to be
 rebalanced.
 Also make sure to have 'discovery.zen.minimum_master_nodes' correctly set
 in your elasticsearch.yml file.

 Meta code

 for node in $cluster_nodes; do
if [ $cluster_status == 'green' ]; then
 cluster_disable_allocation()
 shutdown_node($node)
 wait_for_node_to_rejoin()
 cluster_enable_allocation()
 wait_for_cluster_status_green()
   fi
 done


 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html

 /petter


 On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks mik...@gmail.com wrote:

 What is the proper way of performing a rolling restart of a cluster? I
 currently have my stop script check for the cluster health to be green
 before stopping itself. Unfortunately this doesn't appear to be working.

 My setup:
 ES 1.0.0
 3 node cluster w/ 1 replica.

 When I perform the rolling restart I see the cluster still reporting a
 green state when a node is down. In theory that should be a yellow state
 since some shards will be unallocated. My script output during a rolling
 restart:
 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0

 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0

 curl: (52) Empty reply from server
 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0

 curl: (52) Empty reply from server
 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
 ... continues as green for many more seconds...

 Since it is reporting as green, the second node thinks it can stop and
 ends up putting the cluster into a broken red state:
 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046

 My stop script issues a call to
 http://localhost:9200/_cluster/nodes/_local/_shutdown to kill the node.
 Is it possible the other nodes are waiting to timeout the down node before
 moving into the yellow state? I would assume the shutdown API call would
 inform the other nodes that it 

Re: Rolling restart of a cluster?

2014-04-02 Thread Mike Deeks
That is exactly what I'm doing. For some reason the cluster reports as 
green even though an entire node is down. The cluster doesn't seem to 
notice the node is gone and change to yellow until many seconds later. By 
then my rolling restart script has already gotten to the second node and 
killed it because the cluster was still green for some reason.

On Wednesday, April 2, 2014 4:23:32 AM UTC-7, Petter Abrahamsson wrote:

 Mike,

 Your script needs to check for the status of the cluster before shutting 
 down a node, ie if the state is yellow wait until it becomes green again 
 before shutting down the next node. You'll probably want do disable 
 allocation of shards while each node is being restarted (enable when node 
 comes back) in order to minimize the amount of data that needs to be 
 rebalanced.
 Also make sure to have 'discovery.zen.minimum_master_nodes' correctly set 
 in your elasticsearch.yml file.

 Meta code

 for node in $cluster_nodes; do
   if [ $cluster_status == 'green' ]; then
 cluster_disable_allocation()
 shutdown_node($node)
 wait_for_node_to_rejoin()
 cluster_enable_allocation()
 wait_for_cluster_status_green()
   fi
 done


 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html

 /petter


 On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks mik...@gmail.com javascript:
  wrote:

 What is the proper way of performing a rolling restart of a cluster? I 
 currently have my stop script check for the cluster health to be green 
 before stopping itself. Unfortunately this doesn't appear to be working.

 My setup:
 ES 1.0.0
 3 node cluster w/ 1 replica.

 When I perform the rolling restart I see the cluster still reporting a 
 green state when a node is down. In theory that should be a yellow state 
 since some shards will be unallocated. My script output during a rolling 
 restart:
 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0

 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0

 curl: (52) Empty reply from server
 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0

 curl: (52) Empty reply from server
 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
 ... continues as green for many more seconds...

 Since it is reporting as green, the second node thinks it can stop and 
 ends up putting the cluster into a broken red state:
 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046

 My stop script issues a call to 
 http://localhost:9200/_cluster/nodes/_local/_shutdown to kill the node. 
 Is it possible the other nodes are waiting to timeout the down node before 
 moving into the yellow state? I would assume the shutdown API call would 
 inform the other nodes that it is going down.

 Appreciate any help on how to do this properly.

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/07944665-ce89-4b12-94c2-69e815a4c15f%40googlegroups.com.
For more options, visit 

Re: Rolling restart of a cluster?

2014-04-02 Thread Ivan Brusic
My scripts do a wait for yellow before waiting for green, because as you
noticed, the cluster does not entering a yellow state immediately following
a cluster (shutdown, replica change) event.

-- 
Ivan


On Wed, Apr 2, 2014 at 11:08 AM, Mike Deeks mik...@gmail.com wrote:

 That is exactly what I'm doing. For some reason the cluster reports as
 green even though an entire node is down. The cluster doesn't seem to
 notice the node is gone and change to yellow until many seconds later. By
 then my rolling restart script has already gotten to the second node and
 killed it because the cluster was still green for some reason.


 On Wednesday, April 2, 2014 4:23:32 AM UTC-7, Petter Abrahamsson wrote:

 Mike,

 Your script needs to check for the status of the cluster before shutting
 down a node, ie if the state is yellow wait until it becomes green again
 before shutting down the next node. You'll probably want do disable
 allocation of shards while each node is being restarted (enable when node
 comes back) in order to minimize the amount of data that needs to be
 rebalanced.
 Also make sure to have 'discovery.zen.minimum_master_nodes' correctly
 set in your elasticsearch.yml file.

 Meta code

 for node in $cluster_nodes; do
   if [ $cluster_status == 'green' ]; then
 cluster_disable_allocation()
 shutdown_node($node)
 wait_for_node_to_rejoin()
 cluster_enable_allocation()
 wait_for_cluster_status_green()
   fi
 done

 http://www.elasticsearch.org/guide/en/elasticsearch/
 reference/current/modules-cluster.html

 /petter


 On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks mik...@gmail.com wrote:

 What is the proper way of performing a rolling restart of a cluster? I
 currently have my stop script check for the cluster health to be green
 before stopping itself. Unfortunately this doesn't appear to be working.

 My setup:
 ES 1.0.0
 3 node cluster w/ 1 replica.

 When I perform the rolling restart I see the cluster still reporting a
 green state when a node is down. In theory that should be a yellow state
 since some shards will be unallocated. My script output during a rolling
 restart:
 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0

 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0

 curl: (52) Empty reply from server
 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0

 curl: (52) Empty reply from server
 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
 ... continues as green for many more seconds...

 Since it is reporting as green, the second node thinks it can stop and
 ends up putting the cluster into a broken red state:
 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046

 My stop script issues a call to http://localhost:9200/_
 cluster/nodes/_local/_shutdown to kill the node. Is it possible the
 other nodes are waiting to timeout the down node before moving into the
 yellow state? I would assume the shutdown API call would inform the other
 nodes that it is going down.

 Appreciate any help on how to do this properly.

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%
 40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an

Re: Rolling restart of a cluster?

2014-04-02 Thread Nikolas Everett
I'm not sure what is up but my advice is to make sure you read the cluster
state from the node you are restarting.  That'll make sure it is up in the
first place and you'll get that node's view of the cluster.


Nik


On Wed, Apr 2, 2014 at 2:08 PM, Mike Deeks mik...@gmail.com wrote:

 That is exactly what I'm doing. For some reason the cluster reports as
 green even though an entire node is down. The cluster doesn't seem to
 notice the node is gone and change to yellow until many seconds later. By
 then my rolling restart script has already gotten to the second node and
 killed it because the cluster was still green for some reason.


 On Wednesday, April 2, 2014 4:23:32 AM UTC-7, Petter Abrahamsson wrote:

 Mike,

 Your script needs to check for the status of the cluster before shutting
 down a node, ie if the state is yellow wait until it becomes green again
 before shutting down the next node. You'll probably want do disable
 allocation of shards while each node is being restarted (enable when node
 comes back) in order to minimize the amount of data that needs to be
 rebalanced.
 Also make sure to have 'discovery.zen.minimum_master_nodes' correctly
 set in your elasticsearch.yml file.

 Meta code

 for node in $cluster_nodes; do
if [ $cluster_status == 'green' ]; then
 cluster_disable_allocation()
 shutdown_node($node)
 wait_for_node_to_rejoin()
 cluster_enable_allocation()
 wait_for_cluster_status_green()
   fi
 done

 http://www.elasticsearch.org/guide/en/elasticsearch/
 reference/current/modules-cluster.html

 /petter


 On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks mik...@gmail.com wrote:

 What is the proper way of performing a rolling restart of a cluster? I
 currently have my stop script check for the cluster health to be green
 before stopping itself. Unfortunately this doesn't appear to be working.

 My setup:
 ES 1.0.0
 3 node cluster w/ 1 replica.

 When I perform the rolling restart I see the cluster still reporting a
 green state when a node is down. In theory that should be a yellow state
 since some shards will be unallocated. My script output during a rolling
 restart:
 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0

 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0

 curl: (52) Empty reply from server
 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0

 curl: (52) Empty reply from server
 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
 ... continues as green for many more seconds...

 Since it is reporting as green, the second node thinks it can stop and
 ends up putting the cluster into a broken red state:
 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530

 curl: (52) Empty reply from server
 curl: (52) Empty reply from server
 1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046

 My stop script issues a call to http://localhost:9200/_
 cluster/nodes/_local/_shutdown to kill the node. Is it possible the
 other nodes are waiting to timeout the down node before moving into the
 yellow state? I would assume the shutdown API call would inform the other
 nodes that it is going down.

 Appreciate any help on how to do this properly.

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%
 40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails