Disabling allocation is definitely a temporary only change, you can set it back once you're upgrades are done.
Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 9 January 2014 02:47, Jenny Sivapalan <jennifer.sivapa...@gmail.com>wrote: > Thanks both for the replies. Our rebalance process doesn't take too long > (~5 mins per node). I had some of the plugins (head, paramedic, bigdesk) > open as I was closing down the old nodes and didn't see any split brain > issue although I agree we can lead ourselves down this route by doubling > the instances. We want our cluster to rebalance as we bring nodes in and > out so disabling is not going to work for us unless I'm misunderstanding? > > > On Tuesday, 7 January 2014 22:16:46 UTC, Mark Walkom wrote: >> >> You can also use cluster.routing.allocation.disable_allocation to reduce >> the need of waiting for things to rebalance. >> >> Regards, >> Mark Walkom >> >> Infrastructure Engineer >> Campaign Monitor >> email: ma...@campaignmonitor.com >> web: www.campaignmonitor.com >> >> >> On 8 January 2014 04:41, Ivan Brusic <iv...@brusic.com> wrote: >> >>> Almost elasticsearch should support clusters of nodes with different >>> minor versions, I have seen issues between minor versions. Version 0.90.8 >>> did contain an upgrade of Lucene (4.6), but that does not look like it >>> would cause your issue. You could look at the github issues tagged >>> 0.90.[8-9] and see if something applies in your case. >>> >>> A couple of points about upgrading: >>> >>> If you want to use the double-the-nodes techniques (which should not be >>> necessary for minor version upgrades), you could "decommission" a node >>> using the Shard API. Here is a good writeup: http://blog.sematext. >>> com/2012/05/29/elasticsearch-shard-placement-control/ >>> >>> Since you doubled the amount of nodes in the cluster, >>> the minimum_master_nodes setting would be temporarily incorrect and >>> potential split-brain clusters might occur. In fact, it might have occurred >>> in your case since the cluster state seems incorrect. Merely hypothesizing. >>> >>> Cheers, >>> >>> Ivan >>> >>> >>> On Tue, Jan 7, 2014 at 9:26 AM, Jenny Sivapalan >>> <jennifer....@gmail.com>wrote: >>> >>>> Hello, >>>> >>>> We've upgraded Elastic Search twice over the last month and have >>>> experienced downtime (roughly 8 minutes) during the roll out. I'm not sure >>>> if it something we are doing wrong or not. >>>> >>>> We use EC2 instances for our Elastic Search cluster and cloud formation >>>> to manage our stack. When we deploy a new version or change to Elastic >>>> Search we upload the new artefact, double the number of EC2 instances and >>>> wait for the new instances to join the cluster. >>>> >>>> For example 6 nodes form a cluster on v 0.90.7. We upload the 0.90.9 >>>> version via our deployment process and double the number nodes for the >>>> cluster (12). The 6 new nodes will join the cluster with the 0.90.9 >>>> version. >>>> >>>> We then want to remove each of the 0.90.7 nodes. We do this by shutting >>>> down the node (using the plugin head), wait for the cluster to rebalance >>>> the shards and then terminate the EC2 instances. Then repeat with the next >>>> node. We leave the master node until last so that it does the re-election >>>> just once. >>>> >>>> The issue we have found in the last two upgrades is that while the >>>> penultimate node is shutting down the master starts throwing errors and the >>>> cluster goes red. To fix this we've stopped the Elastic Search process on >>>> master and have had to restart each of the other nodes (though perhaps they >>>> would have rebalanced themselves in a longer time period?). We find >>>> that we send an increase error response to our clients during this time. >>>> >>>> We've set out queue size for search to 300 and we start to see the >>>> queue gets full: >>>> at java.lang.Thread.run(Thread.java:724) >>>> 2014-01-07 15:58:55,508 DEBUG action.search.type [Matt Murdock] >>>> [92036651] Failed to execute fetch phase >>>> org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: >>>> rejected execution (queue capacity 300) on org.elasticsearch.action. >>>> search.type.TransportSearchQueryThenFetchAction$AsyncAction$2@23f1bc3 >>>> at org.elasticsearch.common.util.concurrent.EsAbortPolicy. >>>> rejectedExecution(EsAbortPolicy.java:61) >>>> at java.util.concurrent.ThreadPoolExecutor.reject( >>>> ThreadPoolExecutor.java:821) >>>> >>>> >>>> But also we see the following error which we've been unable to find the >>>> diagnosis for: >>>> 2014-01-07 15:58:55,530 DEBUG index.shard.service [Matt >>>> Murdock] [index-name][4] Can not build 'doc stats' from engine shard state >>>> [RECOVERING] >>>> org.elasticsearch.index.shard.IllegalIndexShardStateException: >>>> [index-name][4] CurrentState[RECOVERING] operations only allowed when >>>> started/relocated >>>> at org.elasticsearch.index.shard.service.InternalIndexShard. >>>> readAllowed(InternalIndexShard.java:765) >>>> >>>> Are we doing anything wrong or has anyone experienced this? >>>> >>>> Thanks, >>>> Jenny >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to elasticsearc...@googlegroups.com. >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/b2328296-e9c9-4763-b61b-6ad2e145e59b% >>>> 40googlegroups.com. >>>> For more options, visit https://groups.google.com/groups/opt_out. >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elasticsearc...@googlegroups.com. >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/elasticsearch/CALY%3DcQCSPct9-Useg_cbvVZkwx_ >>> OoGVa1J%2B7tJXimpHx00rb8A%40mail.gmail.com. >>> >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/5720cb88-f4a9-414d-8299-e6640bf6d7e7%40googlegroups.com > . > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624b5TNq93zxSEvBGAKtPt%3DoRUEmzJE9mjHyp6a0wU5SPDQ%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.