Re: Stress Free Guide To Expanding a Cluster
Try setting "indices.recovery.max_bytes_per_sec" much higher for faster recovery. The default is 20mb/s, and there's a bug in versions prior to 1.2 that rate limit to even lower than that. You didn't specify how big your indices are, but I can fairly accurately predict how long it'll take for the cluster to go green with that parameter. mike On Wednesday, June 25, 2014 8:20:02 AM UTC-4, Nikolas Everett wrote: > > > > > On Wed, Jun 25, 2014 at 8:05 AM, James Carr > wrote: > >> I launched two new EC2 instances to join the cluster and watched. Some >> shards began relocating, no big deal. Six hours later I checked in and >> some shards were still locating, one shard was recovering. Weird but >> whatever... the cluster health is still green and searches are working >> fine. > > > I add new nodes every once in a while and it can take a few hours for > everything to balance out, but six hours is a bit long. Its possible. Do > you have graphs of the count of relocating shards? Something like this can > really help you figure out if everything balanced out at some point and > then unbalanced. Example: > http://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Elasticsearch%20cluster%20eqiad&h=elastic1001.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1403698335&v=0&m=es_relocating_shards&vl=shards&ti=es_relocating_shards&z=large > > Then I got an alert at 2:30am that the cluster state is now >> yellow and find that we have 3 shards marked as recovering and 2 >> shards that unassigned. The cluster still technically works but 24 >> hours later after the new nodes were added I feel like my only choice >> to get a green cluster again will be to simply launch 5 fresh nodes >> and replay all the data from backups into it. Ugh. >> > > This sounds like one of the nodes bounced. It can take a long time to > recover from that. Its something that is being worked on. Check the logs > and see if you see anything about it. > > One thing to make sure of is that you set the number of master nodes > correctly on all nodes. If you have five master eligible nodes then set it > to 3. If the two new nodes aren't master eligible (you have three master > eligible nodes) then set it to 2. > > >> SERIOUSLY! What can I do to prevent this? I feel like I am missing >> something because I always heard the strength of elasticsearch is its >> ease of scaling out but it feels like every time I try it falls to the >> floor. :-( >> > > Its always been pretty painless for me. I did have trouble when I added > nodes that were broken: one time I added nodes without SSDs to a cluster > with SSDs. Another time I didn't set the heap size on the new nodes and > they worked until some shards moved to them. Then they fell over. > > Nik > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/17a60021-e0bc-4806-8573-f37a9ef91b89%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Stress Free Guide To Expanding a Cluster
On Wed, Jun 25, 2014 at 8:05 AM, James Carr wrote: > I launched two new EC2 instances to join the cluster and watched. Some > shards began relocating, no big deal. Six hours later I checked in and > some shards were still locating, one shard was recovering. Weird but > whatever... the cluster health is still green and searches are working > fine. I add new nodes every once in a while and it can take a few hours for everything to balance out, but six hours is a bit long. Its possible. Do you have graphs of the count of relocating shards? Something like this can really help you figure out if everything balanced out at some point and then unbalanced. Example: http://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Elasticsearch%20cluster%20eqiad&h=elastic1001.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1403698335&v=0&m=es_relocating_shards&vl=shards&ti=es_relocating_shards&z=large Then I got an alert at 2:30am that the cluster state is now > yellow and find that we have 3 shards marked as recovering and 2 > shards that unassigned. The cluster still technically works but 24 > hours later after the new nodes were added I feel like my only choice > to get a green cluster again will be to simply launch 5 fresh nodes > and replay all the data from backups into it. Ugh. > This sounds like one of the nodes bounced. It can take a long time to recover from that. Its something that is being worked on. Check the logs and see if you see anything about it. One thing to make sure of is that you set the number of master nodes correctly on all nodes. If you have five master eligible nodes then set it to 3. If the two new nodes aren't master eligible (you have three master eligible nodes) then set it to 2. > SERIOUSLY! What can I do to prevent this? I feel like I am missing > something because I always heard the strength of elasticsearch is its > ease of scaling out but it feels like every time I try it falls to the > floor. :-( > Its always been pretty painless for me. I did have trouble when I added nodes that were broken: one time I added nodes without SSDs to a cluster with SSDs. Another time I didn't set the heap size on the new nodes and they worked until some shards moved to them. Then they fell over. Nik -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0CNzBEv6HC8J-P91qHS46Micb7VjmO2LTXN4JY2QGCkg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Stress Free Guide To Expanding a Cluster
Earlier this week we discovered that our three node elasticsearch cluster needed to be expanded as it was getting dangerously close to maximum capacity. I was nervous about this and read up the best I could on best practices to doing this. The only information I seemed to be able to find is to ensure that the new nodes cannot be elected as masters when they join to avoid a split brain scenario. Fair enough. I launched two new EC2 instances to join the cluster and watched. Some shards began relocating, no big deal. Six hours later I checked in and some shards were still locating, one shard was recovering. Weird but whatever... the cluster health is still green and searches are working fine. Then I got an alert at 2:30am that the cluster state is now yellow and find that we have 3 shards marked as recovering and 2 shards that unassigned. The cluster still technically works but 24 hours later after the new nodes were added I feel like my only choice to get a green cluster again will be to simply launch 5 fresh nodes and replay all the data from backups into it. Ugh. SERIOUSLY! What can I do to prevent this? I feel like I am missing something because I always heard the strength of elasticsearch is its ease of scaling out but it feels like every time I try it falls to the floor. :-( Thanks! James -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJreXKD3Wuyiq5XxGdSWyj3a%3DM2Xd5GQxZ9J3EywoT-OP52qFA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.