Re: Shard count and plugin questions

Todd Nine Tue, 10 Jun 2014 15:55:33 -0700

Hey Mark,
  Thanks for this.  It seems like our best bet will be to manage indexes
the same across all regions, since they're really mirrors.  Since our
documents are immutable, we'll just queue them up for each region, which
will insert or delete them into their index in the region.  It's the only
solution I can think of, since we want to continue processing when WAN
connections go down.



On Tue, Jun 10, 2014 at 3:24 PM, Mark Walkom <ma...@campaignmonitor.com>
wrote:

> There are a few people in the IRC channel that have done it, however,
> generally, cross-WAN clusters are not recommended as ES is sensitive to
> latency.
>
> You may be better off using the snapshot/restore process, or another
> export/import method.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
>
>
> On 11 June 2014 03:11, Todd Nine <tn...@apigee.com> wrote:
>
>> Hey guys,
>>   One last question.  Does anyone do multi region replication with ES?
>>  My current understanding is that with a multi region cluster, documents
>> will be routed to the Region with a node that "owns" the shard the document
>> is being written to.  In our use cases, our cluster must survive a WAN
>> outage.  We don't want the latency of the writes or reads crossing the WAN
>> connection.  Our documents are immutable, so we can work with multi region
>> writes.  We simply need to replicate the write to other regions, as well as
>> the deletes.  Are there any examples or implementations of this?
>>
>> Thanks,
>> Todd
>>
>> On Thursday, June 5, 2014 4:11:44 PM UTC-6, Jörg Prante wrote:
>>>
>>> Yes, routing is very powerful. The general use case is to introduce a
>>> mapping to a large number of shards so you can store parts of data all at
>>> the same shard which is good for locality concepts. For example, combined
>>> with index alias working on filter terms, you can create one big concrete
>>> index, and use segments of it, so many thousands of users can share a
>>> single index.
>>>
>>> Another use case for routing might be a time windowed index where each
>>> shard holds a time window. There are many examples around logstash.
>>>
>>> The combination of index alias and routing is also known as "shard
>>> overallocation". The concept might look complex first but the other option
>>> would be to create a concrete index for every user which might also be a
>>> waste of resources.
>>>
>>> Though some here on this list have managed to run 10000s of shards on a
>>> single node I still find this breathtaking - a few dozens of shards per
>>> node should be ok. Each shard takes some MB on the heap (there are tricks
>>> to reduce this a bit) but a high number of shards takes a handful of
>>> resources even without executing a single query. There might be other
>>> factors worth considering, for example a size limit for a single shard. It
>>> can be quite handy to let ES having move around shards of 1-10 GB instead
>>> of a few 100 GB - it is faster at index recovery or at reallocation time.
>>>
>>> Jörg
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jun 5, 2014 at 9:44 PM, Todd Nine <tn...@apigee.com> wrote:
>>>
>>>> Hey Jorg,
>>>>   Thanks for the reply.  We're using Cassandra heavily in production,
>>>> I'm very familiar with the scale out out concepts.  What we've seen in all
>>>> our distributed systems is that at some point, you reach a saturation of
>>>> your capacity for a single node.  In the case of ES, to me that would seem
>>>> to be shard count.  Eventually, all 5 shards can become too large for a
>>>> node to handle updates and reads efficiently.  This can be caused by a high
>>>> number of documents or document size, or both.  Once we reach this state,
>>>> that index is "full" in the sense that the nodes containing these can no
>>>> longer continue to service traffic at the rate we need it to.  We have 2
>>>> options.
>>>>
>>>> 1) Get bigger hardware.  We do this occasionally, but not ideal since
>>>> this is a distributed system.
>>>>
>>>> 2) Scale out, as you said.  In the case of write throughput it seems
>>>> that we can do this with a pattern of alias + new index, but it's not clear
>>>> to me if that's the right approach.  My initial thinking is to define  some
>>>> sort of routing that pivots on created date to the new index since that's
>>>> an immutable field.  Thoughts?
>>>>
>>>> In the case of read throughput, we an create more replicas.  Our
>>>> systems is about 50/50 now, some users are even read/write, others are very
>>>> read heavy.  I'll probably come up with 2 indexing strategies we can apply
>>>> to an application's index based on the heuristics from the operations
>>>> they're performing.
>>>>
>>>>
>>>> Thanks for the feedback!
>>>> Todd
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jun 5, 2014 at 10:55 AM, joerg...@gmail.com <joerg...@gmail.com
>>>> > wrote:
>>>>
>>>>> Thanks for raising the questions, I will come back later in more
>>>>> detail.
>>>>>
>>>>> Just a quick note, the idea about "shards scale write" and "replica
>>>>> scale read" is correct, but Elasticsearch is also "elastic" which means it
>>>>> "scales out", by adding node hardware. The shard/replica scale pattern
>>>>> finds its limits in a node hardware, because shards/replica are tied to
>>>>> machines, and there are the hard resource constraints, mostly disk I/O and
>>>>> memory related.
>>>>>
>>>>> In the end, you can take as a rule of thumb:
>>>>>
>>>>> - add replica to scale "read" load
>>>>> - add new indices (i.e. new shards) to scale "write" load
>>>>> - and add nodes to scale out the whole cluster for both read and write
>>>>> load
>>>>>
>>>>> More later,
>>>>>
>>>>> Jörg
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jun 5, 2014 at 7:17 PM, Todd Nine <tn...@apigee.com> wrote:
>>>>>
>>>>>> Hey Jörg,
>>>>>>   Thank you for your response.  A few questions/points.
>>>>>>
>>>>>> In our use cases, the inability to write or read is considered a
>>>>>> downtime.  Therefore, I cannot disable writes during expansion.  Your 
>>>>>> alias
>>>>>> points raise
>>>>>> some interesting research I need to do, and I have a few follow up
>>>>>> questions.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Our systems are fully multi tenant.  We currently intend to have 1
>>>>>> index per application.  Each application can have a large number of types
>>>>>> within their index. Each cluster could potentially hold 1000 or more
>>>>>> applications/indexes.  Most users will never need more than 5 shards.
>>>>>> Some users are huge power users, billions of documents for a type with
>>>>>> several large types.  These are the users I am concerned with.
>>>>>>
>>>>>> From my understanding and experimentation, Elastic Search has 2
>>>>>> primary mechanisms for tuning performance to handle the load.  First is 
>>>>>> the
>>>>>> shard count.  The higher the shard count, the more writes you can accept.
>>>>>>  Each shard has a master which accepts the write, and replicates the 
>>>>>> write
>>>>>> to it's replicas.  For high write throughput, you increase the count of
>>>>>> shards to distribute the load across more nodes.  For read throughput, 
>>>>>> you
>>>>>> increase the replica count.  This gives you higher performance on read,
>>>>>> since you now have more than 1 node per shard you can query to get 
>>>>>> results.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Per your suggestion, rather than than copy the documents from an
>>>>>> index with 5 shards to an index with 10 shards, I can theoretically 
>>>>>> create
>>>>>> a new index then add it the alias. For instance, I envision this in the
>>>>>> following way.
>>>>>>
>>>>>>
>>>>>> Aliases:
>>>>>> app1-read
>>>>>> app1-write
>>>>>>
>>>>>>
>>>>>> Initial creation:
>>>>>>
>>>>>> app1-read -> Index: App1-index1 (5 shards, 2 replicas)
>>>>>> app1-write -> Index: App1-index1
>>>>>>
>>>>>> User begins to have too much data in App1-index1.  The 5 shards are
>>>>>> causing hotspots. The following actions take place.
>>>>>>
>>>>>> 1) Create App1-index2 (5 shards, 2 replicas)
>>>>>>
>>>>>> 2) Update app1-read -> App1-index1 and App1-index2
>>>>>>
>>>>>> 3) Update app1-read -> App1-index1 and App1-index2
>>>>>>
>>>>>>
>>>>>> I have some uncertainty around this I could use help with.
>>>>>>
>>>>>> Once the new index has been added, how are requests routed?  For
>>>>>> instance, if I have a document "doc1" in the App1-index1, and I delete it
>>>>>> after adding the new index, is the alias smart enough to update
>>>>>> App1-index1, or will it broadcast the operation to both indexes?  In 
>>>>>> other
>>>>>> words, if I create an alias with 2 or more indexes, will the alias 
>>>>>> perform
>>>>>> routing, or is it a broadcast to all indexes.  How does this scale in
>>>>>> practice?
>>>>>>
>>>>>>
>>>>>> In my current understanding, using an alias on read is simply going
>>>>>> to be the same as if you doubled shards.  If you have 10 shard 
>>>>>> (replication
>>>>>> 2) index, this is functionally equivalent to aliases that aggregate 2
>>>>>> indexes with 5 shards and 2 replicas each.  All 10 shards (or one of 
>>>>>> their
>>>>>> replicas) would need to be read to aggregate the results, correct?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Lastly, we have a Multi Region requirement.  We are by eventually
>>>>>> consistent by design between regions.  We want documents written in one
>>>>>> region to be replicated to another for query.  All documents are 
>>>>>> immutable.
>>>>>>  This is by design, so we don't get document version collisions between
>>>>>> data centers.  What are some current mechanisms in use in production
>>>>>> environments to replicate indexes across regions?  I just can't seem to
>>>>>> find any.  Rivers was my initial thinking so regions can pull data from
>>>>>> other regions.  However if this isn't a good fit, what is?
>>>>>>
>>>>>> Thanks guys!
>>>>>> Todd
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>  On Thu, Jun 5, 2014 at 9:21 AM, joerg...@gmail.com <
>>>>>> joerg...@gmail.com> wrote:
>>>>>>
>>>>>>> The knapsack plugin does not come with a downtime. You can increase
>>>>>>> shards on the fly by copying an index over to another index (even on
>>>>>>> another cluster). The index should be write disabled during copy though.
>>>>>>>
>>>>>>> Increasing replica level is a very simple command, no index copy
>>>>>>> required.
>>>>>>>
>>>>>>> It seems you have a slight misconception about controlling replica
>>>>>>> shards. You can not start dedicated copy actions only from the replica. 
>>>>>>> (By
>>>>>>> setting _preference for search, this works for queries).
>>>>>>>
>>>>>>> Maybe I do not understand your question, but what do you mean by
>>>>>>> "dual writes"? And why would you "move" an index?
>>>>>>>
>>>>>>> Please check the index aliases. The concept of index aliases allow
>>>>>>> redirecting index names in the API by a simple atomic command.
>>>>>>>
>>>>>>> It will be tough to monitor an outgrowing index since there is no
>>>>>>> clear indication of the type "this cluster capacity is full because the
>>>>>>> index is too large or overloaded, please add your nodes now". In real 
>>>>>>> life,
>>>>>>> heaps will fill up here and there, latency will increase, all of a 
>>>>>>> sudden
>>>>>>> queries or indexing will congest now and then. If you encounter this, 
>>>>>>> you
>>>>>>> have no time to copy an old index to a new one - the copy process also
>>>>>>> takes resources, and the cluster may not have enough. You must begin to 
>>>>>>> add
>>>>>>> nodes way before capacity limit is reached.
>>>>>>>
>>>>>>> Instead of copying an index, which is a burden, you should consider
>>>>>>> managing a bunch of indices. If an old index is too small, just start a 
>>>>>>> new
>>>>>>> one which is bigger and has more shards and spans more nodes, and add 
>>>>>>> them
>>>>>>> to the existing set of indices. With index alias you can combine many
>>>>>>> indices into one index name. This is very powerful.
>>>>>>>
>>>>>>> If you can not estimate the data growth rate, I recommend also to
>>>>>>> use a reasonable number of shards from the very start. Say, if you 
>>>>>>> expect
>>>>>>> 50 servers to run an ES node on, then simply start with 50 shards on a
>>>>>>> small number of servers, and add servers over time. You won't have to
>>>>>>> bother about shard count for a very long time if you choose such a 
>>>>>>> strategy.
>>>>>>>
>>>>>>> Do not think about rivers, they are not built for such use cases.
>>>>>>> Rivers are designed as a "play tool" for fetching data quickly from
>>>>>>> external sources, for demo purpose. They are discouraged for serious
>>>>>>> production use, they are not very reliable if they run unattended.
>>>>>>>
>>>>>>> Jörg
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 5, 2014 at 7:33 AM, Todd Nine <tn...@apigee.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> 2) https://github.com/jprante/elasticsearch-knapsack might do
>>>>>>>>> what you want.
>>>>>>>>>
>>>>>>>>
>>>>>>>> This won't quite work for us.  We can't have any down time, so it
>>>>>>>> seems like an A/B system is more appropriate.  What we're currently
>>>>>>>> thinking is the following.
>>>>>>>>
>>>>>>>> Each index has 2 aliases, a read and a write alias.
>>>>>>>>
>>>>>>>> 1) Both read and write aliases point to an initial index. Say shard
>>>>>>>> count 5 replication 2 (ES is not our canonical data source, so we're ok
>>>>>>>> with reconstructing search data)
>>>>>>>>
>>>>>>>> 2) We detect via monitoring we're going to outgrow an index. We
>>>>>>>> create a new index with more shards, and potentially a higher 
>>>>>>>> replication
>>>>>>>> depending on read load.  We then update the write alias to point to 
>>>>>>>> both
>>>>>>>> the old and new index.  All clients will then being dual writes to both
>>>>>>>> indexes.
>>>>>>>>
>>>>>>>> 3) While we're writing to old and new, some process (maybe a
>>>>>>>> river?) will begin copying documents updated < the write alias time 
>>>>>>>> from
>>>>>>>> the old index to the new index.  Ideally, it would be nice if each 
>>>>>>>> replica
>>>>>>>> could copy only it's local documents into the new index.  We'll want to
>>>>>>>> throttle this as well.  Each node will need additional operational 
>>>>>>>> capacity
>>>>>>>> to accommodate the dual writes as well as accepting the write of the 
>>>>>>>> "old"
>>>>>>>> documents.  I'm concerned if we push this through too fast, we could 
>>>>>>>> cause
>>>>>>>> interruptions of service.
>>>>>>>>
>>>>>>>>
>>>>>>>> 4) Once the copy is completed, the read index is moved to the new
>>>>>>>> index, then the old index is removed from the system.
>>>>>>>>
>>>>>>>> Could such a process be implemented as a plugin?  If the work can
>>>>>>>> happen in parallel across all nodes containing a shard we can increase 
>>>>>>>> the
>>>>>>>> process's speed dramatically.  If we have a single worker, like a 
>>>>>>>> river, it
>>>>>>>> might possibly take too long.
>>>>>>>>
>>>>>>>>  --
>>>>>>> You received this message because you are subscribed to a topic in
>>>>>>> the Google Groups "elasticsearch" group.
>>>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/
>>>>>>> topic/elasticsearch/4qO5BZSxWhc/unsubscribe.
>>>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>>>> elasticsearc...@googlegroups.com.
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/elasticsearch/
>>>>>>> CAKdsXoGcxRGjZr%3DS9%3DSaOZYddWyNxjvFNJF41T_
>>>>>>> hHe7inoZOwQ%40mail.gmail.com
>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGcxRGjZr%3DS9%3DSaOZYddWyNxjvFNJF41T_hHe7inoZOwQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>  --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "elasticsearch" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to elasticsearc...@googlegroups.com.
>>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>>> msgid/elasticsearch/CA%2Byzqf_wMHGZcK%2Bxozm%3Dxz4zh5prKmuob%
>>>>>> 3DBbXKzrHoVgiOk1wA%40mail.gmail.com
>>>>>> <https://groups.google.com/d/msgid/elasticsearch/CA%2Byzqf_wMHGZcK%2Bxozm%3Dxz4zh5prKmuob%3DBbXKzrHoVgiOk1wA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  --
>>>>> You received this message because you are subscribed to a topic in the
>>>>> Google Groups "elasticsearch" group.
>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/
>>>>> topic/elasticsearch/4qO5BZSxWhc/unsubscribe.
>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>> elasticsearc...@googlegroups.com.
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/elasticsearch/CAKdsXoEiKbTqhbYM2eeCCh5Y%2BD%
>>>>> 3D1-isbzVPkK7gTeLa4X96spQ%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEiKbTqhbYM2eeCCh5Y%2BD%3D1-isbzVPkK7gTeLa4X96spQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/CA%2Byzqf-f%3DDYUrVebgzA7Dd2JVcxRo7wkXBHZd
>>>> Xtf6GZGD2garw%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/elasticsearch/CA%2Byzqf-f%3DDYUrVebgzA7Dd2JVcxRo7wkXBHZdXtf6GZGD2garw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/a7271f1f-850c-478f-83e8-21b323b07e46%40googlegroups.com
>> <https://groups.google.com/d/msgid/elasticsearch/a7271f1f-850c-478f-83e8-21b323b07e46%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/4qO5BZSxWhc/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAEM624aQ0jfM4SRLrAB19PDTPhkOA%3DnZJUJhW3U1QkuRcTa4Jw%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAEM624aQ0jfM4SRLrAB19PDTPhkOA%3DnZJUJhW3U1QkuRcTa4Jw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CA%2Byzqf9dYjwNL8ZdFN19sB_7-mFP%3D-YfANvfrO2e92iZFYsNKw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Shard count and plugin questions

Reply via email to