Re: Upgrades causing Elastic Search downtime

Ivan Brusic Thu, 09 Jan 2014 09:10:36 -0800

Perhaps I am missing some functionality since I am still on version 0.90.2,
but wouldn't you have to disable/enable allocation after each server
restart during a rolling upgrade? A restarted node will not host any shards
with allocation disabled.


Cheers,

Ivan


On Wed, Jan 8, 2014 at 5:48 PM, Mark Walkom <ma...@campaignmonitor.com>wrote:

> Disabling allocation is definitely a temporary only change, you can set it
> back once you're upgrades are done.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
>
>
> On 9 January 2014 02:47, Jenny Sivapalan <jennifer.sivapa...@gmail.com>wrote:
>
>> Thanks both for the replies. Our rebalance process doesn't take too long
>> (~5 mins per node). I had some of the plugins (head, paramedic, bigdesk)
>> open as I was closing down the old nodes and didn't see any split brain
>> issue although I agree we can lead ourselves down this route by doubling
>> the instances. We want our cluster to rebalance as we bring nodes in and
>> out so disabling is not going to work for us unless I'm misunderstanding?
>>
>>
>> On Tuesday, 7 January 2014 22:16:46 UTC, Mark Walkom wrote:
>>>
>>> You can also use cluster.routing.allocation.disable_allocation to
>>> reduce the need of waiting for things to rebalance.
>>>
>>> Regards,
>>> Mark Walkom
>>>
>>> Infrastructure Engineer
>>> Campaign Monitor
>>> email: ma...@campaignmonitor.com
>>> web: www.campaignmonitor.com
>>>
>>>
>>> On 8 January 2014 04:41, Ivan Brusic <iv...@brusic.com> wrote:
>>>
>>>> Almost elasticsearch should support clusters of nodes with different
>>>> minor versions, I have seen issues between minor versions. Version 0.90.8
>>>> did contain an upgrade of Lucene (4.6), but that does not look like it
>>>> would cause your issue. You could look at the github issues tagged
>>>> 0.90.[8-9] and see if something applies in your case.
>>>>
>>>> A couple of points about upgrading:
>>>>
>>>> If you want to use the double-the-nodes techniques (which should not be
>>>> necessary for minor version upgrades), you could "decommission" a node
>>>> using the Shard API. Here is a good writeup: http://blog.sematext.
>>>> com/2012/05/29/elasticsearch-shard-placement-control/
>>>>
>>>> Since you doubled the amount of nodes in the cluster,
>>>> the minimum_master_nodes setting would be temporarily incorrect and
>>>> potential split-brain clusters might occur. In fact, it might have occurred
>>>> in your case since the cluster state seems incorrect. Merely hypothesizing.
>>>>
>>>> Cheers,
>>>>
>>>> Ivan
>>>>
>>>>
>>>> On Tue, Jan 7, 2014 at 9:26 AM, Jenny Sivapalan <jennifer....@gmail.com
>>>> > wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> We've upgraded Elastic Search twice over the last month and have
>>>>> experienced downtime (roughly 8 minutes) during the roll out. I'm not sure
>>>>> if it something we are doing wrong or not.
>>>>>
>>>>> We use EC2 instances for our Elastic Search cluster and cloud
>>>>> formation to manage our stack. When we deploy a new version or change to
>>>>> Elastic Search we upload the new artefact, double the number of EC2
>>>>> instances and wait for the new instances to join the cluster.
>>>>>
>>>>> For example 6 nodes form a cluster on v 0.90.7. We upload the 0.90.9
>>>>> version via our deployment process and double the number nodes for the
>>>>> cluster (12). The 6 new nodes will join the cluster with the 0.90.9
>>>>> version.
>>>>>
>>>>> We then want to remove each of the 0.90.7 nodes. We do this by
>>>>> shutting down the node (using the plugin head), wait for the cluster to
>>>>> rebalance the shards and then terminate the EC2 instances. Then repeat 
>>>>> with
>>>>> the next node. We leave the master node until last so that it does the
>>>>> re-election just once.
>>>>>
>>>>> The issue we have found in the last two upgrades is that while the
>>>>> penultimate node is shutting down the master starts throwing errors and 
>>>>> the
>>>>> cluster goes red. To fix this we've stopped the Elastic Search process on
>>>>> master and have had to restart each of the other nodes (though perhaps 
>>>>> they
>>>>> would have rebalanced themselves in a longer time period?). We find
>>>>> that we send an increase error response to our clients during this time.
>>>>>
>>>>> We've set out queue size for search to 300 and we start to see the
>>>>> queue gets full:
>>>>>        at java.lang.Thread.run(Thread.java:724)
>>>>> 2014-01-07 15:58:55,508 DEBUG action.search.type        [Matt Murdock]
>>>>> [92036651] Failed to execute fetch phase
>>>>> org.elasticsearch.common.util.concurrent.EsRejectedExecutionException:
>>>>> rejected execution (queue capacity 300) on org.elasticsearch.action.
>>>>> search.type.TransportSearchQueryThenFetchAction$AsyncAction$2@23f1bc3
>>>>>         at org.elasticsearch.common.util.concurrent.EsAbortPolicy.
>>>>> rejectedExecution(EsAbortPolicy.java:61)
>>>>>         at java.util.concurrent.ThreadPoolExecutor.reject(
>>>>> ThreadPoolExecutor.java:821)
>>>>>
>>>>>
>>>>> But also we see the following error which we've been unable to find
>>>>> the diagnosis for:
>>>>>  2014-01-07 15:58:55,530 DEBUG index.shard.service       [Matt
>>>>> Murdock] [index-name][4] Can not build 'doc stats' from engine shard state
>>>>> [RECOVERING]
>>>>> org.elasticsearch.index.shard.IllegalIndexShardStateException:
>>>>> [index-name][4] CurrentState[RECOVERING] operations only allowed when
>>>>> started/relocated
>>>>>         at org.elasticsearch.index.shard.service.InternalIndexShard.
>>>>> readAllowed(InternalIndexShard.java:765)
>>>>>
>>>>>  Are we doing anything wrong or has anyone experienced this?
>>>>>
>>>>> Thanks,
>>>>> Jenny
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to elasticsearc...@googlegroups.com.
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/elasticsearch/b2328296-e9c9-4763-b61b-6ad2e145e59b%
>>>>> 40googlegroups.com.
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>
>>>>
>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/CALY%3DcQCSPct9-Useg_cbvVZkwx_
>>>> OoGVa1J%2B7tJXimpHx00rb8A%40mail.gmail.com.
>>>>
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/5720cb88-f4a9-414d-8299-e6640bf6d7e7%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAEM624b5TNq93zxSEvBGAKtPt%3DoRUEmzJE9mjHyp6a0wU5SPDQ%40mail.gmail.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDrN1gbjp%2BbRmpwMOHCJU4TMvOfaYqjD4nOmJ7Ro9REYA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Upgrades causing Elastic Search downtime

Reply via email to