Re: Upgrades causing Elastic Search downtime

Mark Walkom Wed, 08 Jan 2014 17:49:33 -0800

Disabling allocation is definitely a temporary only change, you can set it
back once you're upgrades are done.


Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 9 January 2014 02:47, Jenny Sivapalan <jennifer.sivapa...@gmail.com>wrote:

> Thanks both for the replies. Our rebalance process doesn't take too long
> (~5 mins per node). I had some of the plugins (head, paramedic, bigdesk)
> open as I was closing down the old nodes and didn't see any split brain
> issue although I agree we can lead ourselves down this route by doubling
> the instances. We want our cluster to rebalance as we bring nodes in and
> out so disabling is not going to work for us unless I'm misunderstanding?
>
>
> On Tuesday, 7 January 2014 22:16:46 UTC, Mark Walkom wrote:
>>
>> You can also use cluster.routing.allocation.disable_allocation to reduce
>> the need of waiting for things to rebalance.
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com
>> web: www.campaignmonitor.com
>>
>>
>> On 8 January 2014 04:41, Ivan Brusic <iv...@brusic.com> wrote:
>>
>>> Almost elasticsearch should support clusters of nodes with different
>>> minor versions, I have seen issues between minor versions. Version 0.90.8
>>> did contain an upgrade of Lucene (4.6), but that does not look like it
>>> would cause your issue. You could look at the github issues tagged
>>> 0.90.[8-9] and see if something applies in your case.
>>>
>>> A couple of points about upgrading:
>>>
>>> If you want to use the double-the-nodes techniques (which should not be
>>> necessary for minor version upgrades), you could "decommission" a node
>>> using the Shard API. Here is a good writeup: http://blog.sematext.
>>> com/2012/05/29/elasticsearch-shard-placement-control/
>>>
>>> Since you doubled the amount of nodes in the cluster,
>>> the minimum_master_nodes setting would be temporarily incorrect and
>>> potential split-brain clusters might occur. In fact, it might have occurred
>>> in your case since the cluster state seems incorrect. Merely hypothesizing.
>>>
>>> Cheers,
>>>
>>> Ivan
>>>
>>>
>>> On Tue, Jan 7, 2014 at 9:26 AM, Jenny Sivapalan 
>>> <jennifer....@gmail.com>wrote:
>>>
>>>> Hello,
>>>>
>>>> We've upgraded Elastic Search twice over the last month and have
>>>> experienced downtime (roughly 8 minutes) during the roll out. I'm not sure
>>>> if it something we are doing wrong or not.
>>>>
>>>> We use EC2 instances for our Elastic Search cluster and cloud formation
>>>> to manage our stack. When we deploy a new version or change to Elastic
>>>> Search we upload the new artefact, double the number of EC2 instances and
>>>> wait for the new instances to join the cluster.
>>>>
>>>> For example 6 nodes form a cluster on v 0.90.7. We upload the 0.90.9
>>>> version via our deployment process and double the number nodes for the
>>>> cluster (12). The 6 new nodes will join the cluster with the 0.90.9
>>>> version.
>>>>
>>>> We then want to remove each of the 0.90.7 nodes. We do this by shutting
>>>> down the node (using the plugin head), wait for the cluster to rebalance
>>>> the shards and then terminate the EC2 instances. Then repeat with the next
>>>> node. We leave the master node until last so that it does the re-election
>>>> just once.
>>>>
>>>> The issue we have found in the last two upgrades is that while the
>>>> penultimate node is shutting down the master starts throwing errors and the
>>>> cluster goes red. To fix this we've stopped the Elastic Search process on
>>>> master and have had to restart each of the other nodes (though perhaps they
>>>> would have rebalanced themselves in a longer time period?). We find
>>>> that we send an increase error response to our clients during this time.
>>>>
>>>> We've set out queue size for search to 300 and we start to see the
>>>> queue gets full:
>>>>        at java.lang.Thread.run(Thread.java:724)
>>>> 2014-01-07 15:58:55,508 DEBUG action.search.type        [Matt Murdock]
>>>> [92036651] Failed to execute fetch phase
>>>> org.elasticsearch.common.util.concurrent.EsRejectedExecutionException:
>>>> rejected execution (queue capacity 300) on org.elasticsearch.action.
>>>> search.type.TransportSearchQueryThenFetchAction$AsyncAction$2@23f1bc3
>>>>         at org.elasticsearch.common.util.concurrent.EsAbortPolicy.
>>>> rejectedExecution(EsAbortPolicy.java:61)
>>>>         at java.util.concurrent.ThreadPoolExecutor.reject(
>>>> ThreadPoolExecutor.java:821)
>>>>
>>>>
>>>> But also we see the following error which we've been unable to find the
>>>> diagnosis for:
>>>>  2014-01-07 15:58:55,530 DEBUG index.shard.service       [Matt
>>>> Murdock] [index-name][4] Can not build 'doc stats' from engine shard state
>>>> [RECOVERING]
>>>> org.elasticsearch.index.shard.IllegalIndexShardStateException:
>>>> [index-name][4] CurrentState[RECOVERING] operations only allowed when
>>>> started/relocated
>>>>         at org.elasticsearch.index.shard.service.InternalIndexShard.
>>>> readAllowed(InternalIndexShard.java:765)
>>>>
>>>>  Are we doing anything wrong or has anyone experienced this?
>>>>
>>>> Thanks,
>>>> Jenny
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/b2328296-e9c9-4763-b61b-6ad2e145e59b%
>>>> 40googlegroups.com.
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/CALY%3DcQCSPct9-Useg_cbvVZkwx_
>>> OoGVa1J%2B7tJXimpHx00rb8A%40mail.gmail.com.
>>>
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/5720cb88-f4a9-414d-8299-e6640bf6d7e7%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624b5TNq93zxSEvBGAKtPt%3DoRUEmzJE9mjHyp6a0wU5SPDQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Upgrades causing Elastic Search downtime

Reply via email to