It sounds like we're going to need to test our upper bounds of indexes (with no data) to see how many we can support. We may need to re-evaluate our thoughts on an index per app. We might be better off doing a statically sized set of indexes, then consistently hashing our applications to those indexes. Thank you for your help!
On Fri, Sep 26, 2014 at 5:44 PM, joergpra...@gmail.com < joergpra...@gmail.com> wrote: > If you consider tens of thousands of indices on tens of thousands of > nodes, and the master node is the only node that can write to the cluster > state, it will have lot of work to do to keep up with all cluster state > updates. > > When the rate of changes to the cluster state increases, the master node > will be challenged to propagate the state changes to all other nodes > reliably and fast enough. There is no "distributed tree" yet, for example, > there are no special "forwarder nodes" that communicate the cluster state > to a partitioned set of nodes. > > See https://github.com/elasticsearch/elasticsearch/issues/6186 > > The cluster state is a big compressed JSON structure which also must fit > into the heap memory of master eligible nodes. ES also uses privileged > network traffic channels for cluster state communication to take precedence > over ordinary index/search messaging. But all these precautions may not be > enough at some point. You can observe the point by retrieving a growing > cluster state over the cluster state API and measure the size and time of > this request. > > On the other hand, you can have calm cluster state and many thousand > indices when the type mappings are constant, no field updates occur, and no > nodes connect/disconnect. It all depends on the situation how you have to > operate the cluster. One important thing is to allocate enough resources to > master eligible data-less nodes so they are not hindered by extra > search/index load. > > N.B. 20 nodes is not a big cluster. There are ES clusters of hundreds and > thousands of nodes. From my understanding, the communication of the master > and 20 nodes is not a serious problem. This becomes an issue at ~500-1000 > nodes. > > Jörg > > > > > On Sat, Sep 27, 2014 at 1:12 AM, Todd Nine <tn...@apigee.com> wrote: > >> Hi Jorg, >> We're storing each application in it's own Index so we can manage it >> independently of others. There's not set load or usage on our >> applications. Some will be very small, a few hundred documents. Others >> will be quite large, in the billions. We have no way of knowing what the >> usage profile is. Rather, we're initially thinking that expansion will >> occur using a combination of additional indexes and aliases referencing >> those indexes. This allows us to automate the management of the aliases >> and indexes, and in turn allows us to scale them to the needs of the >> application without over allocating unused capacity. For instance, with >> write heavy applications we can allocate more shards (via alias), for read >> heavy application we can allocate more replicas. >> >> >> >> We're not running our cluster on a single node. Our cluster is small to >> begin with, it's 6 nodes in our current POC. Ultimately I expect us to >> grow each cluster we stand up to 20 or so nodes. We'll expand as necessary >> to support the number of shards and replicas and keep our performance up. >> I'm not particularly worried about our ability to scale horizontally with >> our hardware. >> >> Rather, I'm concerned with how far can we scale on our number of indexes, >> and how does that relate to the number of machines? When we keep adding >> hardware, does this increase the upper bounds of the number of indexes we >> can have? Not the physical shards and replicas, but the routing >> information for the master of the shards and location of the replicas. >> I've done distributed data storage for many years, and none of the >> documentation on ES makes it clear if this becomes an issue operationally. >> I'm leery to just assume it will "just work". When implementing something >> like this, you either have to do a distributed tree for your meta data to >> get the partitioning you need to scale infinitely, or every node must store >> every shard's master information. How does it work in ES? >> >> >> Thanks, >> Todd >> >> >> >> On Friday, September 26, 2014 4:29:53 PM UTC-6, Jörg Prante wrote: >>> >>> Why do you want to create huge number of indexes on just a single node? >>> >>> There are smarter methods to scale. Use over-allocation of shards. This >>> is explained by kimchy in this thread >>> >>> http://elasticsearch-users.115913.n3.nabble.com/Over- >>> allocation-of-shards-td3673978.html >>> <http://www.google.com/url?q=http%3A%2F%2Felasticsearch-users.115913.n3.nabble.com%2FOver-allocation-of-shards-td3673978.html&sa=D&sntz=1&usg=AFQjCNEk7KTtpuEot3JtBBmMRMpH25vLDA> >>> >>> TL;DR you can create many thousands of aliases on a single (or few) >>> indices with just a few shards. There is no limit defined by ES, when your >>> configuration / hardware capacity is exceeded, you will see the node >>> getting sluggish. >>> >>> Jörg >>> >>> On Fri, Sep 26, 2014 at 11:23 PM, Todd Nine <tn...@apigee.com> wrote: >>> >>>> Hey guys. We’re building a Multi tenant application, where users >>>> create applications within our single server. For our current ES scheme, >>>> we're building an index per application. Are there any stress tests or >>>> documentation on the upper bounds of the number of indexes a cluster can >>>> handle? From my current understanding of meta data and routing, ever node >>>> caches the meta data of all the indexes and shards for routing. At some >>>> point, this will obviously overwhelm the node. Is my current understanding >>>> correct, or is this information partitioned across the cluster as well? >>>> >>>> >>>> Thanks, >>>> >>>> Todd >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to elasticsearc...@googlegroups.com. >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/200e1ce7-c56f-49d4-9c02-4b1dcc570bf2% >>>> 40googlegroups.com >>>> <https://groups.google.com/d/msgid/elasticsearch/200e1ce7-c56f-49d4-9c02-4b1dcc570bf2%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearch+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/595db4ab-b2f3-4dfe-bbf0-e4c13926e75e%40googlegroups.com >> <https://groups.google.com/d/msgid/elasticsearch/595db4ab-b2f3-4dfe-bbf0-e4c13926e75e%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to a topic in the > Google Groups "elasticsearch" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/elasticsearch/9HzLNik4D-o/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGULM1L_6s%3Dk7SAftA%3D0TM3Jx1haYddnAOLAuHa-4z%2BOw%40mail.gmail.com > <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGULM1L_6s%3Dk7SAftA%3D0TM3Jx1haYddnAOLAuHa-4z%2BOw%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2Byzqf8nH_FkR%3Dt0DA8NQf51AMUKLoo%2BYp3Ju3dO8ceMeKvU0w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.