@mauri, thank you for such interesting analysis. On 21/03/2014 1:01 PM, "Mauri" <ma...@proactive-edge.com.au> wrote:
> Hi Brad > > I agree with what Mark and Zachary have said and will expand on these. > > Firstly, shard and index level operations in ElasticSearch are > peer-to-peer. Single-shard operations will affect at most 2 nodes, the node > receiving the request and the node hosting an instance of the shard > (primary or replica depending on the operation). Multi-shard operations > (such a searches) will affect from one to (N +1) nodes where N is the > number of shards in the index. > > So from an index/shard operation perspective there is no reason to split > into two clusters. The key issue with index/shard operations is that the > cluster is able to handle the traffic volume. So if you do decide to split > out into to two clusters you will need to look at the load profile for each > of your client types to determine how much raw processing power you need in > each cluster. It may be that a 10:20 split is more optimum than a 15:15 > split between clusters to balance request traffic, and therefore CPU > utilisation, across all nodes. If you go with one cluster this is not an > issue because you can move shards between nodes to balance the request > traffic. > > Larger clusters also imply more work for the cluster master in managing > the cluster. This comes down to the number of nodes that the master has to > communicate with, and manage, and the size of the cluster state. A cluster > with 30 nodes is not too large for a master to keep track of. There will be > an increase in network traffic associated with the increase in volume of > master-to-worker and worker-to-master pings used to detect the > presence/absence of nodes. This can be offset by reducing the respective > ping intervals. > > In a large cluster it is good practice to have a group of dedicated master > nodes, say 3, from which the master is elected. These nodes do not host any > user data meaning that cluster management is not compromised by high user > request traffic. > > The size of the cluster state may be more of an issue. The cluster state > comprises all of the information about the cluster configuration. The > cluster state has records for each node, index, document mapping, shard, > etc. Whenever there is a change to the cluster state it is first made by > the master which then sends the updated cluster state to each worker node. > Note that the entire cluster state is sent, not just the changes! It is > therefore highly desirable to limit that frequency of changes to the > cluster state, primarily by minimizing dynamic field mapping updates, and > the overall size of the cluster state, primarily by minimizing the number > of indices. > > In your proposed model the size of the cluster state associated the set of > 60 shared month indices will be larger than that of one set of 60 dedicated > month indices by virtue of having 100 shards to 6. However, it may not be > much bigger because there will be much more metadata associated with > defining the index structure, notably the field mappings for all document > types in the index, than the metadata defining the shards of the index. So > it may well be that the size of the cluster state associated with 60 > "shared" month indices plus N sets of 60 "dedicated" indices is not much > more than that of (N + 1) sets of 60 "dedicated" indices. So there may not > be much point in splitting to two clusters. A quick way to look at this for > your actual data model is to: > 1. Set up an index in ES with mappings for all document types and 6 > shards and 0 replicas, > 2. Retrieve the index metadata JSON using ES admin API, > 3. Increase the number of replicas to 16 (102 shards total), > 4. Retrieve the index metadata JSON using ES admin API, > 5. Compare the two JSON documents from 2 and 4. > > As state above it is desirable to minimize the number of indices. Each > shard is a Lucene index which consumes memory and requires open file > descriptors from the OS for segment data files and Lucene index level > files. You may find yourself running out of memory and/or file descriptors > if you are not careful. > > I understand you are looking for a design that will cater for on disc data > volume. Given that your data is split into monthly indices it may well be > that no one index, either "shared" or "dedicated" will reach that volume in > one month. There may also be seasonal factors to consider whereby one or > two months have much higher volumes than others. I have read/heard about > cases where a monthly index architecture was implemented but later scraped > for a single index approach because the month-to-month variation in volume > was detrimental to overall system resource utilisation and performance. > > In you case think about whether monthly indices are really appropriate. An > alternative model is to partition one years worth of data into a set of > indices bounded by size rather than time. In this model a new index is > started on Jan 01 and data is added to it until it reaches some predefined > size limit, at which point a new index is created to accept new data from > that point on. This is repeated until year end at which point you might end > up with data for Jan/Feb/Mar and half of Apr in index 2014-01, the rest of > Apr plus May - Oct in index 2014-02 and Nov/Dec in index 2014-03. This way > you end up using the smallest number of indices within the constraints of > manageable shard size and overall user data volume. This may also be a > better approach than indices with 100 shards. This does, however, come at > the cost of more complexity when it comes to accessing data across multiple > indices, but this is a one off development cost rather than an ongoing > maintenance cost. > > Also, rather than focusing so much on shard size look at the number and > size of the segment files comprising a shard. Remember that Lucene segments > are essentially immutable (except for marking deletes) after they are > written to disc. This means that some index management operations may > reduce down to simply copying/moving one or two segment files across the > network, rather than all segment files for a given shard. For example, when > allocating shards to nodes ElasticSearch tries to allocate shards to nodes > that already have some of that shard's segment data. The new incremental > backup/restore in ES V1 also takes advantage of this segment immutability. > With this in mind you might be able to support shards > 5G consisting of > more segment files of bounded size rather than fewer segment files of > unbounded size (and ultimately a single 5G segment file). > > Hope this helps, > Regards > Mauri > > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/5072d2c9-a418-4afc-82e6-d2b8926d82c1%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/5072d2c9-a418-4afc-82e6-d2b8926d82c1%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACj2-4L%3DhmXjxC7ogE6gr02by78vji41qY0Zsa-YzxLxbQLDfw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.