Re: Upgrades causing Elastic Search downtime

Ivan Brusic Tue, 07 Jan 2014 09:42:03 -0800

Almost elasticsearch should support clusters of nodes with different minor
versions, I have seen issues between minor versions. Version 0.90.8 did
contain an upgrade of Lucene (4.6), but that does not look like it would
cause your issue. You could look at the github issues tagged 0.90.[8-9] and
see if something applies in your case.


A couple of points about upgrading:

If you want to use the double-the-nodes techniques (which should not be
necessary for minor version upgrades), you could "decommission" a node
using the Shard API. Here is a good writeup:
http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/

Since you doubled the amount of nodes in the cluster,
the minimum_master_nodes setting would be temporarily incorrect and
potential split-brain clusters might occur. In fact, it might have occurred
in your case since the cluster state seems incorrect. Merely hypothesizing.

Cheers,

Ivan


On Tue, Jan 7, 2014 at 9:26 AM, Jenny Sivapalan <
jennifer.sivapa...@gmail.com> wrote:

> Hello,
>
> We've upgraded Elastic Search twice over the last month and have
> experienced downtime (roughly 8 minutes) during the roll out. I'm not sure
> if it something we are doing wrong or not.
>
> We use EC2 instances for our Elastic Search cluster and cloud formation to
> manage our stack. When we deploy a new version or change to Elastic Search
> we upload the new artefact, double the number of EC2 instances and wait for
> the new instances to join the cluster.
>
> For example 6 nodes form a cluster on v 0.90.7. We upload the 0.90.9
> version via our deployment process and double the number nodes for the
> cluster (12). The 6 new nodes will join the cluster with the 0.90.9
> version.
>
> We then want to remove each of the 0.90.7 nodes. We do this by shutting
> down the node (using the plugin head), wait for the cluster to rebalance
> the shards and then terminate the EC2 instances. Then repeat with the next
> node. We leave the master node until last so that it does the re-election
> just once.
>
> The issue we have found in the last two upgrades is that while the
> penultimate node is shutting down the master starts throwing errors and the
> cluster goes red. To fix this we've stopped the Elastic Search process on
> master and have had to restart each of the other nodes (though perhaps they
> would have rebalanced themselves in a longer time period?). We find that
> we send an increase error response to our clients during this time.
>
> We've set out queue size for search to 300 and we start to see the queue
> gets full:
>        at java.lang.Thread.run(Thread.java:724)
> 2014-01-07 15:58:55,508 DEBUG action.search.type        [Matt Murdock]
> [92036651] Failed to execute fetch phase
> org.elasticsearch.common.util.concurrent.EsRejectedExecutionException:
> rejected execution (queue capacity 300) on
> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction$2@23f1bc3
>         at
> org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:61)
>         at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
>
>
> But also we see the following error which we've been unable to find the
> diagnosis for:
> 2014-01-07 15:58:55,530 DEBUG index.shard.service       [Matt Murdock]
> [index-name][4] Can not build 'doc stats' from engine shard state
> [RECOVERING]
> org.elasticsearch.index.shard.IllegalIndexShardStateException:
> [index-name][4] CurrentState[RECOVERING] operations only allowed when
> started/relocated
>         at
> org.elasticsearch.index.shard.service.InternalIndexShard.readAllowed(InternalIndexShard.java:765)
>
> Are we doing anything wrong or has anyone experienced this?
>
> Thanks,
> Jenny
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/b2328296-e9c9-4763-b61b-6ad2e145e59b%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCSPct9-Useg_cbvVZkwx_OoGVa1J%2B7tJXimpHx00rb8A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Upgrades causing Elastic Search downtime

Reply via email to