That is correct, I was mixing the terms "nodes" and "shards" (sorry about that). I'm running the test on a single node (machine). I've chosen 20 shards so we could eventually go to a 20 server cluster without re-indexing. It's unlikely we'll ever need to go that high but we never know and given we receive 750 million messages a day, the thought of reindexing after collecting a years worth of data makes me nervous. If I can "over shard" and avoid a massive reindex then I'll be a happy guy.
I thought about reducing the 20 shards but even if I go to say 5 shards on 5 machines (1 shard per machine?) then I'll still run into the issue if a user searches several years back. Any other thoughts on a possible solution? Would increasing the queue size be a good option. Is there a down side (performance hit, running out of resources, etc)? Thanks again! On Tuesday, February 25, 2014 11:32:26 PM UTC-8, David Pilato wrote: > > You are mixing nodes and shards, right? > How many elasticsearch nodes do you have to manage your 7300 shards? > Why did you set 20 shards per index? > > You can increase the queue size in elasticsearch.yml but I'm not sure it's > the right thing to do here. > > My 2 cents > > -- > David ;-) > Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs > > > Le 26 févr. 2014 à 01:36, Alex Clark <al...@bitstew.com <javascript:>> a > écrit : > > Hello all, I’m getting failed nodes when running searches and I’m hoping > someone can point me in the right direction. I have indices created per > day to store messages. The pattern is pretty straight forward: the index > for January 1 is "messages_20140101", for January 2 is "messages_20140102" > and so on. Each index is created against a template that specifies 20 > shards. A full year will give 365 indices * 20 shards = 7300 nodes. I have > recently upgraded to ES 1.0. > > When I search for all messages in a year (either using an alias or > specifying “messages_2013*”), I get many failed nodes. The reason given > is: “EsRejectedExecutionException[rejected execution (queue capacity > 1000) on > org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@651b8924<javascript:>]”). > > The more often I search, the fewer failed nodes I get (probably caching in > ES) but I can’t get down to 0 failed nodes. I’m using ES for analytics so > the document counts coming back have to be accurate. The aggregate counts > will change depending on the number of node failures. We use the Java API > to create a local node to index and search the documents. However, we also > see the issue if we use the URL search API on port 9200. > > If I restrict the search for 30 days then I do not see any failures (it’s > under 1000 nodes so as expected). However, it is a pretty common use case > for our customers to search messages spanning an entire year. Any > suggestions on how I can prevent these failures? > > Thank you for your help! > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearc...@googlegroups.com <javascript:>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/9bf6d3bb-34e5-44c4-8d76-24f868d283a0%40googlegroups.com > . > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/954f7266-6587-4509-8159-aae5897dc2b6%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.