Perhaps I am missing some functionality since I am still on version 0.90.2, but wouldn't you have to disable/enable allocation after each server restart during a rolling upgrade? A restarted node will not host any shards with allocation disabled.
Cheers, Ivan On Wed, Jan 8, 2014 at 5:48 PM, Mark Walkom <ma...@campaignmonitor.com>wrote: > Disabling allocation is definitely a temporary only change, you can set it > back once you're upgrades are done. > > Regards, > Mark Walkom > > Infrastructure Engineer > Campaign Monitor > email: ma...@campaignmonitor.com > web: www.campaignmonitor.com > > > On 9 January 2014 02:47, Jenny Sivapalan <jennifer.sivapa...@gmail.com>wrote: > >> Thanks both for the replies. Our rebalance process doesn't take too long >> (~5 mins per node). I had some of the plugins (head, paramedic, bigdesk) >> open as I was closing down the old nodes and didn't see any split brain >> issue although I agree we can lead ourselves down this route by doubling >> the instances. We want our cluster to rebalance as we bring nodes in and >> out so disabling is not going to work for us unless I'm misunderstanding? >> >> >> On Tuesday, 7 January 2014 22:16:46 UTC, Mark Walkom wrote: >>> >>> You can also use cluster.routing.allocation.disable_allocation to >>> reduce the need of waiting for things to rebalance. >>> >>> Regards, >>> Mark Walkom >>> >>> Infrastructure Engineer >>> Campaign Monitor >>> email: ma...@campaignmonitor.com >>> web: www.campaignmonitor.com >>> >>> >>> On 8 January 2014 04:41, Ivan Brusic <iv...@brusic.com> wrote: >>> >>>> Almost elasticsearch should support clusters of nodes with different >>>> minor versions, I have seen issues between minor versions. Version 0.90.8 >>>> did contain an upgrade of Lucene (4.6), but that does not look like it >>>> would cause your issue. You could look at the github issues tagged >>>> 0.90.[8-9] and see if something applies in your case. >>>> >>>> A couple of points about upgrading: >>>> >>>> If you want to use the double-the-nodes techniques (which should not be >>>> necessary for minor version upgrades), you could "decommission" a node >>>> using the Shard API. Here is a good writeup: http://blog.sematext. >>>> com/2012/05/29/elasticsearch-shard-placement-control/ >>>> >>>> Since you doubled the amount of nodes in the cluster, >>>> the minimum_master_nodes setting would be temporarily incorrect and >>>> potential split-brain clusters might occur. In fact, it might have occurred >>>> in your case since the cluster state seems incorrect. Merely hypothesizing. >>>> >>>> Cheers, >>>> >>>> Ivan >>>> >>>> >>>> On Tue, Jan 7, 2014 at 9:26 AM, Jenny Sivapalan <jennifer....@gmail.com >>>> > wrote: >>>> >>>>> Hello, >>>>> >>>>> We've upgraded Elastic Search twice over the last month and have >>>>> experienced downtime (roughly 8 minutes) during the roll out. I'm not sure >>>>> if it something we are doing wrong or not. >>>>> >>>>> We use EC2 instances for our Elastic Search cluster and cloud >>>>> formation to manage our stack. When we deploy a new version or change to >>>>> Elastic Search we upload the new artefact, double the number of EC2 >>>>> instances and wait for the new instances to join the cluster. >>>>> >>>>> For example 6 nodes form a cluster on v 0.90.7. We upload the 0.90.9 >>>>> version via our deployment process and double the number nodes for the >>>>> cluster (12). The 6 new nodes will join the cluster with the 0.90.9 >>>>> version. >>>>> >>>>> We then want to remove each of the 0.90.7 nodes. We do this by >>>>> shutting down the node (using the plugin head), wait for the cluster to >>>>> rebalance the shards and then terminate the EC2 instances. Then repeat >>>>> with >>>>> the next node. We leave the master node until last so that it does the >>>>> re-election just once. >>>>> >>>>> The issue we have found in the last two upgrades is that while the >>>>> penultimate node is shutting down the master starts throwing errors and >>>>> the >>>>> cluster goes red. To fix this we've stopped the Elastic Search process on >>>>> master and have had to restart each of the other nodes (though perhaps >>>>> they >>>>> would have rebalanced themselves in a longer time period?). We find >>>>> that we send an increase error response to our clients during this time. >>>>> >>>>> We've set out queue size for search to 300 and we start to see the >>>>> queue gets full: >>>>> at java.lang.Thread.run(Thread.java:724) >>>>> 2014-01-07 15:58:55,508 DEBUG action.search.type [Matt Murdock] >>>>> [92036651] Failed to execute fetch phase >>>>> org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: >>>>> rejected execution (queue capacity 300) on org.elasticsearch.action. >>>>> search.type.TransportSearchQueryThenFetchAction$AsyncAction$2@23f1bc3 >>>>> at org.elasticsearch.common.util.concurrent.EsAbortPolicy. >>>>> rejectedExecution(EsAbortPolicy.java:61) >>>>> at java.util.concurrent.ThreadPoolExecutor.reject( >>>>> ThreadPoolExecutor.java:821) >>>>> >>>>> >>>>> But also we see the following error which we've been unable to find >>>>> the diagnosis for: >>>>> 2014-01-07 15:58:55,530 DEBUG index.shard.service [Matt >>>>> Murdock] [index-name][4] Can not build 'doc stats' from engine shard state >>>>> [RECOVERING] >>>>> org.elasticsearch.index.shard.IllegalIndexShardStateException: >>>>> [index-name][4] CurrentState[RECOVERING] operations only allowed when >>>>> started/relocated >>>>> at org.elasticsearch.index.shard.service.InternalIndexShard. >>>>> readAllowed(InternalIndexShard.java:765) >>>>> >>>>> Are we doing anything wrong or has anyone experienced this? >>>>> >>>>> Thanks, >>>>> Jenny >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elasticsearch" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to elasticsearc...@googlegroups.com. >>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>> msgid/elasticsearch/b2328296-e9c9-4763-b61b-6ad2e145e59b% >>>>> 40googlegroups.com. >>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to elasticsearc...@googlegroups.com. >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/CALY%3DcQCSPct9-Useg_cbvVZkwx_ >>>> OoGVa1J%2B7tJXimpHx00rb8A%40mail.gmail.com. >>>> >>>> For more options, visit https://groups.google.com/groups/opt_out. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearch+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/5720cb88-f4a9-414d-8299-e6640bf6d7e7%40googlegroups.com >> . >> For more options, visit https://groups.google.com/groups/opt_out. >> > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAEM624b5TNq93zxSEvBGAKtPt%3DoRUEmzJE9mjHyp6a0wU5SPDQ%40mail.gmail.com > . > > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDrN1gbjp%2BbRmpwMOHCJU4TMvOfaYqjD4nOmJ7Ro9REYA%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.